i i ate ‘ Le. . “ . & 


Elementary Statistics 
Picturing the World 


SEVENTH EDITION 


Ron Larson * Betsy Farber 


SEVENTH EDITION 


lementary Statistics 


PICTURING THE WORLD 
GLOBAL EDITION 


Ron Larson 


The Pennsylvania State University 
The Behrend College 


Betsy Farber 


Bucks County Community College 


° 


Pearson 


Harlow, England e London e New York ¢ Boston ¢ San Francisco ¢ Toronto * Sydney * Dubai * Singapore * Hong Kong 
Tokyo ¢ Seoul ¢ Taipei * New Delhi * Cape Town ¢ Sao Paulo * Mexico City « Madrid e Amsterdam ¢ Munich ¢ Paris ¢ Milan 


Director, Portfolio Management: Deirdre Lynch 

Senior Courseware Portfolio Manager: Patrick Barbera 
Acquisitions Editor, Global Edition: Sourabh Maheshwari 
Editorial Assistant: Morgan Danna 

Content Producer: Tamela Ambush 

Assistant Project Editor, Global Edition: Sulagna Dasgupta 
Managing Producer: Karen Wernholm 

Media Producer: Audra Walsh 

Media Production Manager, Global Edition: Vikram Kumar 
Manager, Courseware QA: Mary Durnwald 

Manager, Content Development: Robert Carroll 

Product Marketing Manager: Emily Ockay 

Field Marketing Manager: Andrew Noble 

Marketing Assistants: Shannon McCormack, Erin Rush 
Senior Author Support/Technology Specialist: Joe Vetere 
Manager, Rights and Permissions: Gina Cheselka 
Manufacturing Buyer: Carol Melville, LSC Communications 
Senior Manufacturing & Composition Controller, Global Edition: Angela Hawksbee 
Text Design: Cenveo Publisher Services 

Cover Design: Lumina Datamatics 


Attributions of third party content appear on page P1, which constitutes an extension of this copyright page. 


Pearson Education Limited 
KAO Two 

KAO Park 

Harlow 

CM17 9NA 

United Kingdom 


and Associated Companies throughout the world 
Visit us on the World Wide Web at: www.pearsonglobaleditions.com 
© Pearson Education Limited 2019 


The rights of Ron Larson and Betsy Farber to be identified as the authors of this work have been asserted by them in accordance 
with the Copyright, Designs and Patents Act 1988. 


Authorized adaptation from the United States edition, entitled Elementary Statistics: Picturing the World, Seventh Edition, 
ISBN 978-0-134-68341-6 by Ron Larson and Betsy Farber, published by Pearson Education © 2018. 


All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any 
means, electronic, mechanical, photocopying, recording or otherwise, without either the prior written permission of the publisher or 
a license permitting restricted copying in the United Kingdom issued by the Copyright Licensing Agency Ltd, Saffron House, 6-10 
Kirby Street, London ECIN 8TS. 


All trademarks used herein are the property of their respective owners. The use of any trademark in this text does not vest in the 
author or publisher any trademark ownership rights in such trademarks, nor does the use of such trademarks imply any affiliation 
with or endorsement of this book by such owners. 


British Library Cataloguing-in-Publication Data 
A catalogue record for this book is available from the British Library 


ISBN 10: 1-292-26046-7 
ISBN 13: 978-1-292-26046-4 


10987654321 


Typeset by Integra Software Services Private Limited 
Printed and bound by Vivar in Malaysia 


C O N F = N Ji S Preface 9 Index of Applications 16 


Acknowledgments 14 


[PART 1 DESCRIPTIVE STATISTICS 


Introduction to Statistics 


J Where You've Been L Where You're Going» 23 


1.1 An Overview of Statistics 24 
1.2 Data Classification 37 
Case Study: Reputations of Companies in the U.S. 38 
1.3 Data Collection and Experimental Design 39 
Activity: Random Numbers 49 
Uses and Abuses: Statistics in the Real World 50 
Chapter Summary 57 
Review Exercises 52 
Chapter Quiz 54 
Chapter Test 55 
Real Statistics—Real Decisions: Putting it all together 56 


History of Statistics— Timeline 57 
Technology: Using Technology in Statistics 58 


Descriptive Statistics 2. 


J Where You've Been L Where You're Going 67 


2.1 Frequency Distributions and Their Graphs 62 
2.2 More Graphs and Displays 77 
2.3 Measures of Central Tendency 39 
Activity: Mean Versus Median 103 
2.4 Measures of Variation 104 
Activity: Standard Deviation 122 
Case Study: Business Size 123 
2.5 Measures of Position 124 
Uses and Abuses: Statistics in the Real World 136 
Chapter Summary 137 
Review Exercises 738 
Chapter Quiz 742 
Chapter Test 143 
Real Statistics—Real Decisions: Putting it all together 144 


Technology: Parking Tickets 145 
Using Technology to Determine Descriptive Statistics 146 
Cumulative Review: Chapters 1 and 2 748 


CONTENTS 3 


4 CONTENTS 


[PART 2 PROBABILITY AND PROBABILITY DISTRIBUTIONS 


Probability so 


J Where You've Been L Where You're Going = 152 


3.1 Basic Concepts of Probability and Counting 752 
Activity: Simulating the Stock Market 168 
3.2 Conditional Probability and the Multiplication Rule 769 
3.3 The Addition Rule 179 
Activity: Simulating the Probability of Rolling a 3 or 4 188 
Case Study: United States Congress 189 
3.4 Additional Topics in Probability and Counting 190 
Uses and Abuses: Statistics in the Real World 200 
Chapter Summary 207 
Review Exercises 202 
Chapter Quiz 206 
Chapter Test 207 
Real Statistics—Real Decisions: Putting it all together 208 


Technology: Simulation: Composing Mozart Variations with Dice 209 


Discrete Probability Distributions 2. 


Pa Where You've Been L Where You're Going 272 


4.1 Probability Distributions 272 
4.2 Binomial Distributions 223 

Activity: Binomial Distribution 236 

Case Study: Distribution of Number of Hits in Baseball Games 237 
4.3 More Discrete Probability Distributions 238 

Uses and Abuses: Statistics in the Real World 245 

Chapter Summary 246 

Review Exercises 247 

Chapter Quiz 250 

Chapter Test 257 

Real Statistics—Real Decisions: Putting it all together 252 


Technology: Using Poisson Distributions as Queuing Models 253 


CONTENTS 5 


Normal Probability Distributions 2x. 


J Where You've Been L Where You're Going 255 


5.1 Introduction to Normal Distributions and the Standard Normal Distribution 256 
5.2 Normal Distributions: Finding Probabilities 268 
5.3 Normal Distributions: Finding Values 274 
Case Study: Birth Weights in America 282 
5.4 Sampling Distributions and the Central Limit Theorem 283 
Activity: Sampling Distributions 296 
5.5 — Normal Approximations to Binomial Distributions 297 
Uses and Abuses: Statistics in the Real World 306 
Chapter Summary 307 
Review Exercises 308 
Chapter Quiz 372 
Chapter Test 373 
Real Statistics—Real Decisions: Putting it all together 314 


Technology: Age Distribution in California 315 
Cumulative Review: Chapters 3-5 316 


PART 3. STATISTICAL INFERENCE 


Confidence Intervals =. 


oS Where You've Been “ap Where You're Going 279 


| 6.1 Confidence Intervals for the Mean (o Known) 320 


6.2 Confidence Intervals for the Mean (o Unknown) 332 
Activity: Confidence Intervals for a Mean 340 
Case Study: Marathon Training 341 
6.3 Confidence Intervals for Population Proportions 342 
Activity: Confidence Intervals for a Proportion 351 
4 6.4 Confidence Intervals for Variance and Standard Deviation 352 
Uses and Abuses: Statistics in the Real World 358 
Chapter Summary 359 
Review Exercises 360 
Chapter Quiz 362 
Chapter Test 363 
Real Statistics—Real Decisions: Putting it all together 364 
Technology: United States Foreign Policy Polls 365 


Using Technology to Construct Confidence Intervals 366 


CONTENTS 


Hypothesis Testing with One Sample ase 


oS Where You've Been “Lp Where You're Going 26s 


71 
IZ 
1a 
74 


ia 


Introduction to Hypothesis Testing 370 

Hypothesis Testing for the Mean (ao Known) 385 

Hypothesis Testing for the Mean (ao Unknown) 399 

Activity: Hypothesis Tests for a Mean 408 

Case Study: Human Body Temperature: What's Normal? 409 
Hypothesis Testing for Proportions 470 

Activity: Hypothesis Tests for a Proportion 415 

Hypothesis Testing for Variance and Standard Deviation 476 

A Summary of Hypothesis Testing 424 

Uses and Abuses: Statistics in the Real World 426 

Chapter Summary 427 

Review Exercises 428 

Chapter Quiz 432 

Chapter Test 433 

Real Statistics—Real Decisions: Putting it all together 434 
Technology: The Case of the Vanishing Women 435 
Using Technology to Perform Hypothesis Tests 436 


Hypothesis Testing with Iwo Samples az 


J Where You've Been “ep Where You're Going 439 


8.1 
8.2 


8.3 
8.4 


Testing the Difference Between Means (Independent Samples, o-,; and ov, Known) 440 
Testing the Difference Between Means (Independent Samples, o-; and o-, Unknown) 450 
Case Study: How Protein Affects Weight Gain in Overeaters 458 

Testing the Difference Between Means (Dependent Samples) 459 

Testing the Difference Between Proportions 469 

Uses and Abuses: Statistics in the Real World 476 

Chapter Summary 477 

Review Exercises 478 

Chapter Quiz 482 

Chapter Test 483 

Real Statistics—Real Decisions: Putting it all together 484 

Technology: Tails over Heads 485 

Using Technology to Perform Two-Sample Hypothesis Tests 486 

Cumulative Review: Chapters 6-8 488 


CONTENTS 7 


[PART 4 UM MORE STATISTICAL INFERENCE 


Correlation and Regression —s 


J Where You've Been ES Where You're Going +492 


9.1 Correlation 492 

Activity: Correlation by Eye 507 
9.2 Linear Regression 508 

Activity: Regression by Eye 518 


Case Study: Correlation by Body Measurements 519 
9.3 Measures of Regression and Prediction Intervals 520 
9.4 Multiple Regression 537 
Uses and Abuses: Statistics in the Real World 536 
Chapter Summary 537 
Review Exercises 538 
Chapter Quiz 542 
Chapter Test 543 
Real Statistics—Real Decisions: Putting it all together 544 


Technology: Nutrients in Breakfast Cereals 545 


Chi-Square Tests and the F-Distribution 


J Where You've Been ES Where You're Going 547 


10.1 Goodness-of-Fit Test 548 
10.2 Independence 558 
Case Study: Food Safety Survey 570 
10.3 Comparing Two Variances 571 
10.4 Analysis of Variance 580 
Uses and Abuses: Statistics in the Real World 592 
Chapter Summary 593 
Review Exercises 594 
Chapter Quiz 598 
Chapter Test 599 
Real Statistics—Real Decisions: Putting it all together 600 


Technology: Teacher Salaries 601 
Cumulative Review: Chapters 9 and 10 602 


CONTENTS 


Nonparametric Tests (Web Only)" 


J Where You've Been L Where You're Going 


11.1 The Sign Test 
11.2 The Wilcoxon Tests 
Case Study: College Ranks 
11.3 The Kruskal-Wallis Test 
11.4 Rank Correlation 
11.5 The Runs Test 
Uses and Abuses: Statistics in the Real World 
Chapter Summary 
Review Exercises 
Chapter Quiz 
Chapter Test 
Real Statistics—Real Decisions: Putting it all together 
Technology: U.S. Income and Economic Research 


* Available at www.pearsonglobaleditions.com/larson and in MyLab Statistics. 


Appendices 


APPENDIX A Alternative Presentation of the Standard Normal Distribution 7 
Standard Normal Distribution Table (0-to-z) A7 
Alternative Presentation of the Standard Normal Distribution A2 


APPENDIXB Tables 47 

Table 1 Random Numbers A7 

Table 2. Binomial Distribution A8& 

Table 3. Poisson Distribution A11 

Table 4 Standard Normal Distribution A16 

Table 5 ~—‘t-Distribution A18 

Table 6 Chi-Square Distribution A19 

Table 7 = -F-Distribution A20 

Table 8 Critical Values for the Sign Test A25 

Table 9 = Critical Values for the Wilcoxon Signed-Rank Test A25 
Table 10 Critical Values for the Spearman Rank Correlation Coefficient A26 
Table 11. Critical Values for the Pearson Correlation Coefficient A26 
Table 12 Critical Values for the Number of Runs A27 


APPENDIX C = Normal Probability Plots 28 


Answers to the Try It Yourself Exercises A37 
Answers to the Odd-Numbered Exercises A4o 


Index 17 
Photo Credits P7 


Ree 


Welcome to Elementary Statistics: Picturing the World, Seventh 
Edition. You will find that this textbook is written with a balance 
of rigor and simplicity. It combines step-by-step instruction, 
real-life examples and exercises, carefully developed features, 
and technology that makes statistics accessible to all. 

I am grateful for the overwhelming acceptance of the first 
six editions. It is gratifying to know that my vision of combining 
theory, pedagogy, and design to exemplify how statistics is used 
to picture and describe the world has helped students learn 
about statistics and make informed decisions. 


What's New in this Edition 


The goal of the Seventh Edition was a thorough update of the 
key features, examples, and exercises: 


Examples This edition has 213 examples, over 60% of which 
are new or revised. Also, several of the examples now show an 
alternate solution or a check using technology. 


Technology Examples In addition to showing screen 
displays from Minitab®, Excel®, and the TI-84 Plus, this edition 
also shows screen displays from StatCrunch®. 


Try lt Yourself Over 40% of the 213 Try It Yourself exercises 
are new or revised. 


Picturing the World Over 50% of these are new or revised. 


Tech Tips New to this edition are technology tips that appear 
in most sections. These tips show how to use Minitab, Excel, the 
TI-84 Plus, or StatCrunch to solve a problem. 


Exercises Over 40% of the more than 2300 exercises are new 
or revised. 


Extensive Chapter Feature Updates Over 60% of the 
following key features are new or revised, making this edition 
fresh and relevant to today’s students: 

e Where You've Been and Where You're Going 

e Uses and Abuses: Statistics in the Real World 

e Real Statistics-Real Decisions: Putting it all together 

e Chapter Technology Project 


Revised Content Here is a summary of the content changes. 

e Section 1.1 now has more discussion about populations 

and samples, how to identify them, and their relationships 

to parameters and statistics. Also, the Venn Diagrams 

have been redrawn to use clearer labeling to help students 
distinguish between a population and a sample. 

e In Section 1.3, the figure depicting systemic sampling 
has been redrawn to more clearly depict the sampling 
process. 

e Section 2.1 now has more discussion of class widths and 
open-ended classes. Also, a figure showing a histogram 
and its corresponding frequency polygon was added after 
Example 4. 

e In Section 2.4, Example 9 was rewritten to explain the use 
of an open-ended class. 


e Section 2.5 now has a Study Tip discussing outliers and 
modified box-and-whisker plots. On pages 146 and 147 
students are shown how to create modified box-and- 
whisker plots using technology. 

e In Section 3.1, the solutions to the examples were rewritten 
to explain why a formula was chosen to find a probability. 

¢ In Chapter 5, in addition to using a table, examples were 
revised and Tech Tips were added to show how to find areas 
or probabilities using technology. 

¢ In Chapter 6, in addition to using a table, examples were 
revised and Tech Tips were added to show how to find 
critical values using technology. Also, the exercises in this 
chapter were revised to ask more conceptual questions. 

e Section 6.2 now has more explanation about why the 
t-distribution is needed when o is unknown. Also, the 
flowchart on page 336 was revised to illustrate when it is not 
possible to use the normal distribution or the ¢-distribution 
to construct a confidence interval. 

e In Chapters 7-9, in addition to using a table, examples 
were revised and Tech Tips were added to show how to find 
P-values and critical values using technology. 

e Section 8.2 now shows the formula for the number of 
degrees of freedom for the f-test often used by technology. 

e InSection 9.1, the requirements to use a correlation coefficient 
r to make an inference about a population have been revised. 


Features of the Seventh Edition 
Guiding Student Learning 


Where You’ve Been and Where You’re Going Each 
chapter begins with a two-page visual description of a real-life 
problem. Where You’ve Been connects the chapter to topics 
learned in earlier chapters. Where You’re Going gives students 
an overview of the chapter. 


What You Should Learn Each section is organized by 
learning objectives, presented in everyday language in What You 
Should Learn. The same objectives are then used as subsection 
titles throughout the section. 


Definitions and Formulas are clearly presented in 
easy-to-locate boxes. They are often followed by Guidelines, 
which explain In Words and In Symbols how to apply the 
formula or understand the definition. 


Margin Features help reinforce understanding: 

e Study Tips show how to read a table, interpret a result, 
help drive home an important interpretation, or connect 
different concepts. 

¢ Tech Tips show how to use Minitab, Excel, the TI-84 Plus, 
or StatCrunch to solve a problem. 

e Picturing the World is a “mini case study” in each section 
that illustrates the important concept or concepts of the 
section. Each Picturing the World concludes with a question 
and can be used for general class discussion or group work. 


Examples and Exercises 


Examples Every concept in the text is clearly illustrated 
with one or more step-by-step examples. Most examples have 
an interpretation step that shows the student how the solution 
may be interpreted within the real-life context of the example 
and promotes critical thinking and writing skills. Each example, 
which is numbered and titled for easy reference, is followed 
by a similar exercise called Try If Yourself so students can 
immediately practice the skill learned. The answers to these 
exercises are in the back of the book, and the worked-out 
solutions are in the Student’s Solutions Manual available in 
MyLab Statistics. 


Technology Examples Many sections contain an example 
that shows how technology can be used to calculate formulas, 
perform tests, or display data. Screen displays from Minitab, 
Excel, the TI-84 Plus, and StatCrunch are shown. Additional 
screen displays are presented at the ends of selected chapters, 
and detailed instructions are given in separate technology 
manuals available in MyLab Statistics. 


Exercises The exercises give students practice in performing 
calculations, making decisions, providing explanations, and 
applying results to a real-life setting. The section exercises are 
divided into three parts: 

¢ Building Basic Skills and Vocabulary are short answer, 
true or false, and vocabulary exercises carefully written to 
nurture student understanding. 

e Using and Interpreting Concepts are skill or word 
problems that move from basic skill development to more 
challenging and interpretive problems. 

e Extending Concepts go beyond the material presented in 
the section. They tend to be more challenging and are not 
required as prerequisites for subsequent sections. 


Technology Answers Answers in the back of the book are 
found using calculations by hand and by tables. Answers found 
using technology (usually the TI-84 Plus) are also included when 
there are discrepancies due to rounding. 


Review and Assessment 


Chapter Summary Each chapter concludes with a Chapter 
Summary that answers the question What did you learn? The 
objectives listed are correlated to Examples in the section as 
well as to the Review Exercises. 


Chapter Review Exercises A set of Review Exercises 
follows each Chapter Summary. The order of the exercises 
follows the chapter organization. Answers to all odd-numbered 
exercises are given in the back of the book. 


Chapter Quizzes Each chapter has a Chapter Quiz. The 
answers to all quiz questions are provided in the back of the 
book. For additional help, see the step-by-step video solutions 
available in MyLab Statistics. 


Chapter Tests Each chapter has a Chapter Test. The 
questions are in random order. 

Cumulative Review There is a Cumulative Review after 
Chapters 2,5, 8, and 10. Exercises in the Cumulative Review are 
in random order and may incorporate multiple ideas. Answers 
to all odd-numbered exercises are given in the back of the book. 


10 


Statistics in the Real World 


Uses and Abuses: Statistics in the Real World Each 
chapter discusses how statistical techniques should be used, 
while cautioning students about common abuses. The discussion 
includes ethics, where appropriate. Exercises help students apply 
their knowledge. 


Applet Activities Selected sections contain activities 
that encourage interactive investigation of concepts in the 
lesson with exercises that ask students to draw conclusions. 
The applets are available in MyLab Statistics and at 
www.pearsonglobaleditions.com/larson. 


Chapter Case Study Each chapter has a full-page Case 
Study featuring actual data from a real-world context and 
questions that illustrate the important concepts of the chapter. 


Real Statistics -Real Decisions: Putting it all together 
This feature encourages students to think critically and make 
informed decisions about real-world data. Exercises guide 
students from interpretation to drawing of conclusions. 


Chapter Technology Project Each chapter has a 
Technology project using Minitab, Excel, and the TI-84 Plus 
that gives students insight into how technology is used to handle 
large data sets or real-life questions. 


Continued Strong Pedagogy 
from the Sixth Edition 


Versatile Course Coverage The table of contents was 
developed to give instructors many options. For instance, the 
Extending Concepts exercises, applet activities, Real Statistics— 
Real Decisions, and Uses and Abuses provide sufficient content 
for the text to be used in a two-semester course. More commonly, 
I expect the text to be used in a three-credit semester course or a 
four-credit semester course that includes a lab component. In such 
cases, instructors will have to pare down the text’s 46 sections. 


Graphical Approach As with most introductory 
statistics texts, this text begins the descriptive statistics chapter 
(Chapter 2) with a discussion of different ways to display data 
graphically. A difference between this text and many others is 
that it continues to incorporate the graphical display of data 
throughout the text. For example, see the use of stem-and-leaf 
plots to display data on page 409. This emphasis on graphical 
displays is beneficial to all students, especially those utilizing 
visual learning strategies. 


Balanced Approach The text strikes a balance among 
computation, decision making, and conceptual understanding. 
I have provided many Examples, Exercises, and Try It Yourself 
exercises that go beyond mere computation. 


Variety of Real-Life Applications I have chosen 
real-life applications that are representative of the majors of 
students taking introductory statistics courses. I want statistics to 
come alive and appear relevant to students so they understand 
the importance of and rationale for studying statistics. I wanted 
the applications to be authentic—but they also need to be 
accessible. See the Index of Applications on page 16. 


Data Sets and Source Lines The data sets in the book 
were chosen for interest, variety, and their ability to illustrate 
concepts. Most of the 250-plus data sets contain real data with 


source lines. The remaining data sets contain simulated data that 
are representative of real-life situations. All data sets containing 
20 or more entries are available in a variety of formats in MyLab™ 
Statistics. In the exercise sets, the data sets that are available 
electronically are indicated by the icon . 


Flexible Technology Although most formulas in the 
book are illustrated with “hand” calculations, I assume that 
most students have access to some form of technology, such 
as Minitab, Excel, StatCrunch, or the TI-84 Plus. Because 
technology varies widely, the text is flexible. It can be used in 
courses with no more technology than a scientific calculator —or 
it can be used in courses that require sophisticated technology 
tools. Whatever your use of technology, I am sure you agree with 
me that the goal of the course is not computation. Rather, it is to 
help students gain an understanding of the basic concepts and 
uses of statistics. 


Prerequisites Algebraic manipulations are kept to a 
minimum — often I display informal versions of formulas using 
words in place of or in addition to variables. 


Choice of Tables My experience has shown that students 
find a cumulative distribution function (CDF) table easier to 
use than a “0-to-z” table. Using the CDF table to find the area 
under the standard normal curve is a topic of Section 5.1 on 
pages 259-263. Because some teachers prefer to use the “0-to-z” 
table, an alternative presentation of this topic is provided in 
Appendix A. 


Page Layout Statistics instruction is more accessible when 
it is carefully formatted on each page with a consistent open 
layout. This text is the first college-level statistics book to be 
written so that, when possible, its features are not split from one 
page to the next. Although this process requires extra planning, 
the result is a presentation that is clean and clear. 


Meeting the Standards 


MAA, AMATYC, NCTM Standards This text answers 
the call for a student-friendly text that emphasizes the uses of 
statistics. My goal is not to produce statisticians but to produce 
informed consumers of statistical reports. For this reason, 
I have included exercises that require students to interpret 
results, provide written explanations, find patterns, and make 
decisions. 


GAISE Recommendations Funded by the American 
Statistical Association, the Guidelines for Assessment and 
Instruction in Statistics Education (GAISE) Project developed 
six recommendations for teaching introductory statistics in a 
college course. These recommendations are: 

e Emphasize statistical literacy and develop statistical 

thinking. 
e Use real data. 


e Stress conceptual understanding rather than mere 
knowledge of procedures. 


e Foster active learning in the classroom. 


e Use technology for developing conceptual understanding 
and analyzing data. 


e Use assessments to improve and evaluate student learning. 


The examples, exercises, and features in this text embrace all of 
these recommendations. 


Technology Resources 


MyLab Statistics Online Course 
(access code required) 


Used by nearly one million students a year, MyLab Statistics is 
the world’s leading online program for teaching and learning 
statistics. MyLab Statistics delivers assessment, tutorials, and 
multimedia resources that provide engaging and personalized 
experiences for each student, so learning can happen in any 
environment. 


Personalized Learning Not every student learns the same 
way or at the same rate. Personalized learning in the MyLab 
gives instructors the flexibility to incorporate the approach that 
best suits the needs of their course and students. 


e Based on their performance on a quiz or test, personalized 
homework allows students to focus on just the topics they 
have not yet mastered. 


e With Companion Study Plan Assignments you can assign 
the Study Plan as a prerequisite to a test or quiz, guiding 
students through the concepts they need to master. 


Preparedness Preparedness is one of the biggest challenges 
in statistics courses. Pearson offers a variety of content and 
course options to support students with just-in-time remediation 
and key-concept review as needed. 


e Redesign-Ready Course Options Many new course 
models have emerged in recent years, as institutions 
"redesign" to help improve retention and results. At 
Pearson, we’re focused on tailoring solutions to support 
your plans and programs. 

e Getting Ready for Statistics Questions This question 
library contains more than 450 exercises that cover the 
relevant developmental math topics for a given section. 
These can be made available to students for extra practice 
or assigned as a prerequisite to other assignments. 


Conceptual Understanding Successful students have 
the ability to apply their statistical ideas and knowledge to 
new concepts and real-world situations. Providing frequent 
opportunities for data analysis and interpretation helps students 
develop the 21st century skills that they need in order to be 
successful in the classroom and workplace. 


e Conceptual Question Library There are 1,000 questions 
in the Assignment Manager that require students to apply 
their statistical understanding. 


e Modern statistics is practiced with technology, and MyLab 
Statistics makes learning and using software programs 
seamless and intuitive. Instructors can copy data sets 
from the text and MyLab Statistics exercises directly 
into software such as StatCrunch or Excel®. Students can 
also access instructional support tools including tutorial 
videos, Study Cards, and manuals for a variety of statistical 
software programs including StatCrunch, Excel, Minitab®, 
JMP®, R, SPSS, and TI 83/84 calculators. 


Motivation Students are motivated to succeed when they 
are engaged in the learning experience and understand the 
relevance and power of statistics. 
e Exercises with Immediate Feedback Homework 
and practice exercises in MyLab Statistics regenerate 
algorithmically to give students unlimited opportunity for 


11 


practice and mastery. Instructors can choose from the many 
exercises available for the author’s approach—or even 
choose additional exercises from other MyLab Statistics 
courses. Most exercises include learning aids, such as guided 
solutions, sample problems, extra help at point-of-use, and 
immediate feedback when students enter incorrect answers. 


e Instructors can create, import, and manage online 
homework assignments, quizzes, and tests—or start with 
sample assignments —all of which are automatically graded, 
allowing instructors to spend less time grading, and more 
time teaching. 


Data & Analytics MyLab Statistics provides resources 
to help instructors assess and improve student results. A 
comprehensive gradebook with enhanced reporting functionality 
makes it easier for instructors to manage courses efficiently. 


¢ Reporting Dashboard Instructors can view, analyze, 
and report learning outcomes, gaining the information 
they need to keep our students on track. Available via 
the Gradebook and fully mobile-ready, the Reporting 
Dashboard presents student performance data at the class, 
section, and program levels in an accessible, visual manner. 
Its finegrain reports allow instructors and administrators 
to compare performance across different courses, across 
individual sections and within each course. 


e Item Analysis Instructors can track  class-wide 
understanding of particular exercises in order to refine 
your class lectures or adjust the course/department syllabus. 
Just-in-time teaching has never been easier. 


Accessibility Pearson works continuously to ensure our 
products are as accessible as possible to all students. We are 
working toward achieving WCAG 2.0 Level AA and Section 508 
standards, as expressed in the Pearson Guidelines for Accessible 
Educational Web Media, www.pearson.com/mylab/statistics/ 
accessibility. 


The following feature is new to the MyLab Statistics course of 
this edition: 


UPDATED! Video Program 


Chapter Review Exercises come to life with new review videos 
that help students understand key chapter concepts. Section 
Lecture Videos work through examples and elaborate on key 
objectives. 


Ho* wg <0 “= 0,05 
Ho: Ag 79 (craim) df sizes Il 
t, = 1-796 
H Regechin- Resin 
€ > 1.796 


12 


StatCrunch 


Integrated directly into MyLab Statistics, StatCrunch® is powerful 
web-based statistical software that allows users to perform 
complex analyses, share data sets, and generate compelling 
reports of their data. 


e Collect Users can upload their own data to StatCrunch or 
search a large library of publicly shared data sets, spanning 
almost any topic of interest. A Featured Data page houses 
the best data sets, making it easy for instructors to use 
current data in their course. Data sets from the text and 
from online homework exercises can also be accessed and 
analyzed in StatCrunch. An online survey tool allows users 
to quickly collect data via web-based surveys. 


e Crunch A full range of numerical and graphical methods 
allow users to analyze and gain insights from any data 
set. Interactive graphics help users understand statistical 
concepts, and are available for export to enrich reports with 
visual representations of data. 


e Communicate Reporting options help users create a wide 
variety of visually-appealing representations of their data. 


StatCrunch is integrated into MyLab Statistics, but it is also 
available by itself to qualified adopters. StatCrunch is also 
now available on your smartphone or tablet when you visit 
www.statcrunch.com from the device’s browser. For more 
information, visit our website at www.statcrunch.com, or contact 
your Pearson representative. 


NEW! StatCrunch Question Library 


This library of questions provides opportunities for students to 
analyze and interpret data sets in StatCrunch. Instructors can 
assign individual questions from the library by topic or they can 
assign questions from the same data set as a longer assignment 
that spans multiple learning objectives. 


university's upenening 
‘formation. Cracker Barrel (Question #1) 
ers [Sian in to anatyze datat | 
The research team leadership is 
enelysie, descete the shape ofthe: Row Geographic Annual Revenue Average Cost of Gasoline Miles from Interstate 
1 Southeast 12000000 3.42 0.35 
HU. Wa debt ncocnenhll 2 Midwest 12378991 3.26 0.58 
3 Northeast 12149171 3.29 07 
4 __ Midatlantic 14412876 2.68 0.92 
5 West 15244993 2.69 0.48 
6 South 15157320 2.62 O.1 
7 Southeast 13242108 2.86 05 
8 Midwest 18226327 24 0.82 
9 Northeast 12763602 3.11 0.34 
Question is complete, Tap on the red i 75) taidattantic 13905469 2.73 0.31 
All parts showing 11 West 19508494 2.08 0.41 
S Show completed problem © Work gil 22 South 13841958 2.85 0.73 
13 Southeast 18352320 2.04 0.12 
a ead 18398740 22 0.29 


Resources for Success 


Instructor Resources 


Instructors Solutions Manual (downloadable) 
Includes complete solutions to all of the exercises (including 
exercises in Try it Yourself, Case Study, Technology, Uses and 
Abuses, and Real Statistics— Real Decisions sections). It can be 
downloaded from within MyLab Statistics or from Pearson’s 
online catalog, www.pearsonglobaleditions.com. 


PowerPoint Lecture Slides (downloadable) Classroom 
presentation slides feature key concepts, examples, and 
definitions from this text, along with notes with suggestions for 
presenting the material in class. They can be downloaded from 
within MyLab Statistics or from Pearson’s online catalog, www. 
pearsonglobaleditions.com. 


TestGen TestGen® (www.pearson.com/testgen) enables 
instructors to build, edit, print, and administer tests using a 
computerized bank of questions developed to cover all the 
objectives of the text. TestGen is algorithmically based, allowing 
instructors to create multiple but equivalent versions of the same 
question or test with the click of a button. Instructors can also 
modify test bank questions or add new questions. The software 
and test bank are available for download from Pearson’s online 
catalog, www.pearsonglobaleditions.com. The questions are also 
assignable in MyLab Statistics. 


Learning Catalytics Now included in all MyLab Statistics 
courses, this student response tool uses students’ smartphones, 
tablets, or laptops to engage them in more interactive tasks and 
thinking during lecture. Learning Catalytics™ fosters student 
engagement and peer-to-peer learning with real-time analytics. 
Access pre-built exercises created specifically for statistics. 


Student Resources 
Video Resources 


A comprehensive set of videos tied to the textbook contain 
short video clips with solutions to Try It Yourself exercises, 
Chapter Quiz Prep Videos, and Section Lecture Videos. Also, 
StatTalk Videos, hosted by fun-loving statistician Andrew 
Vickers, demonstrate important statistical concepts through 
interesting stories and real-life events. StatTalk Videos include 
assessment questions and an instructor’s guide. 


Student’s Solutions Manual (downloadable) This 
manual includes complete worked-out solutions to all of the Try 
It Yourself exercises, the odd-numbered exercises, and all of the 
Chapter Quiz exercises. This manual can be downloaded from 
MyLab Statistics. 


Technology Manuals for Elementary Statistics 


(downloadable)  Technology-specific manuals for 
Graphing Calculator, Excel®, and Minitab® include tutorial 
instruction and worked-out examples from the book. Each man- 
ual can be downloaded from within MyLab Statistics. 


13 


ACKNOWLEDGMENTS 


I owe a debt of gratitude to the many reviewers who helped me shape 


and refine Elementary Statistics: Picturing the World, Seventh Edition. 


Reviewers of the Current Edition 


Karen Benway, University of Vermont 

B.K. Brinkley, Tidewater Community College 

Christine Curtis, Hillsborough Community College—Dale Mabry 
Carrie Elledge, San Juan College 

Jason Malozzi, Lower Columbia College 

Cynthia McGinnis, Northwest Florida State College 

Larry Musolino, Pennsylvania State University 

Cyndi Roemer, Union County College 

Jean Rowley, American Public University and DeVry University 
Heidi Webb, Horry Georgetown Technical College 


Reviewers of the Previous Editions 


Rosalie Abraham, Florida Community College at Jacksonville 
Ahmed Adala, Metropolitan Community College 

Olcay Akman, College of Charleston 

Polly Amstutz, University of Nebraska, Kearney 

John J. Avioli, Christopher Newport University 

David P. Benzel, Montgomery College 

John Bernard, University of Texas— Pan American 

G. Andy Chang, Youngstown State University 

Keith J. Craswell, Western Washington University 

Carol Curtis, Fresno City College 

Dawn Dabney, Northeast State Community College 
Cara DeLong, Fayetteville Technical Community College 
Ginger Dewey, York Technical College 

David DiMarco, Neumann College 

Gary Egan, Monroe Community College 

Charles Ehler, Anne Arundel Community College 
Harold W. Ellingsen, Jr., SUNY — Potsdam 

Michael Eurgubian, Santa Rosa Jr. College 

Jill Fanter, Walters State Community College 

Patricia Foard, South Plains College 

Douglas Frank, Indiana University of Pennsylvania 
Frieda Ganter, California State University 

David Gilbert, Santa Barbara City College 

Donna Gorton, Butler Community College 

Larry Green, Lake Tahoe Community College 

Sonja Hensler, St. Petersburg Jr. College 

Sandeep Holay, Southeast Community College, Lincoln Campus 
Lloyd Jaisingh, Morehead State 


Nancy Johnson, Manatee Community College 

Martin Jones, College of Charleston 

David Kay, Moorpark College 

Mohammad Kazemi, University of North Carolina—Charlotte 

Jane Keller, Metropolitan Community College 

Susan Kellicut, Seminole Community College 

Hyune-Ju Kim, Syracuse University 

Rita Kolb, Cantonsville Community College 

Rowan Lindley, Westchester Community College 

Jeffrey Linek, St. Petersburg Jr. College 

Benny Lo, DeVry University, Fremont 

Diane Long, College of DuPage 

Austin Lovenstein, Pulaski Technical College 

Rhonda Magel, North Dakota State University 

Mike McGann, Ventura Community College 

Vicki McMillian, Ocean County College 

Lynn Meslinsky, Erie Community College 

Lyn A. Noble, Florida Community College at Jacksonville — 
South Campus 

Julie Norton, California State University —Hayward 

Lynn Onken, San Juan College 

Lindsay Packer, College of Charleston 

Nishant Patel, Northwest Florida State 

Jack Plaggemeyer, Little Big Horn College 

Eric Preibisius, Cuyamaca Community College 

Melonie Rasmussen, Pierce College 

Neal Rogness, Grand Valley State University 

Elisabeth Schuster, Benedictine University 

Jean Sells, Sacred Heart University 

John Seppala, Valdosta State University 

Carole Shapero, Oakton Community College 

Abdullah Shuaibi, Harry S. Truman College 

Aileen Solomon, Trident Technical College 

Sandra L. Spain, Thomas Nelson Community College 

Michelle Strager-McCarney, Penn State—Erie, The Behrend College 

Jennifer Strehler, Oakton Community College 

Deborah Swiderski, Macomb Community College 

William J. Thistleton, SUNY —Institute of Technology, Utica 

Millicent Thomas, Northwest University 

Agnes Tuska, California State University —Fresno 

Clark Vangilder, DeVry University 

Ting-Xiu Wang, Oakton Community 

Dex Whittinghall, Rowan University 

Cathleen Zucco-Teveloff, Rider University 


Many thanks to Betsy Farber for her significant contributions to previous editions of the text. Sadly, Betsy passed away in 2013. 

I would also like to thank the staff of Larson Texts, Inc., who assisted with the production of the book. On a personal level, I am 
grateful to my spouse, Deanna Gilbert Larson, for her love, patience, and support. Also, a special thanks goes to R. Scott O’Neil. 

I have worked hard to make this text a clean, clear, and enjoyable one from which to teach and learn statistics. Despite my best 
efforts to ensure accuracy and ease of use, many users will undoubtedly have suggestions for improvement. I welcome your suggestions. 


fer Luson) 


Ron Larson, odx@psu.edu 


14 


Acknowledgments of the Global Edition 


Pearson would like to thank the following contributor and reviewers for their help and guidance in creating this Global Edition. 


Contributor of the Current Edition Reviewers of the Current Edition 
Vikas Arora Hakan Carlqvist, KTH Royal Institute of Technology 
Kiran Paul 


Abhishek Kumar Umrawal, Purdue University 


15 


INDEX OF APPLICATIONS 


Biology and Life Sciences 

Adult weights, 89, 90, 91 

American alligator tail lengths, 
149 

Bacteria, 517 

Birth weights and gestation 
periods, 282 

Black bear weights, 73, 363 

Black cherry tree volume, 534 

Blood types, 152, 177, 182 

Body temperature of humans, 
380, 409 

BRCA1 gene, 175 

Breeds of horses, 35 

Brown trout, 240 

Calves born on a farm, 213 

Cone cells, 175 

Diameters of white oak trees, 287 

Elephant weight, 534 

Elk in Pennsylvania, 40 

Endangered and threatened 
species, 595 

Eye color, 163, 175 

Female femur lengths, A30 

Female fibula lengths, 72 

Female heights, 110, 130 

Female weights, 101 

Fish measurements, 533 

Fisher’s Iris data set, 82 

Fork length of yellowfin tuna, 455 

Genders of children, 202 

Genders of students of university, 
52 

Genetics, 166, 235 

Germination, 116 

Gestational lengths of horses, 143 

Heights and trunk diameters of 
trees, 527, 529 

Heights and weight of students, 141 

Hip and abdomen 
circumferences, 519 

Incubation period for ostrich 
eggs, 489 

Infant crawling age and average 
monthly temperature, 521 

Infant weight, 128 

Length and girth of harbor seals, 
514 

Life spans of lady bugs, 134 

Litter size of Florida panthers, 
354 

Male heights, 73, 110, 130 

Mean birth weight, 488 

Metacarpal bone length and 
height of adults, 603 

Milk produced by cows, 255 

North Atlantic right whale dive 
duration, 407 

Polar bears’ weight, 75, 86 

Rabbits, 240 

Salmon swimming, 159, 171 

Sample height averages, 120 


16 


Sample weight averages, 120 
Shoe size and height, 514 
Species of leaves, 35 
Swimming, 202 

Weights of adults, 478 
Weights of boys, 96 

Weights of cats, 276, 478 
Weights of dogs, 276, 478 
Weights of newborns, 257 


Business 

Accounting department advisory 
committee, 193 

Advertising time and sales, 523 

Annual household expenditures, 
140 

Annual profits, 132 

Annual revenues, 52 

Attracting more customers, 41 

Automobile battery prices, 587 

Bank employee procedure 
preference, 564, 568 

Bankruptcies, 244 

Better Business Bureau 
complaints, 81 

Book prices, 328 

Cauliflower yield, 534 

CEO compensations, 53 

Cell phone prices, 337, 338 

Commercial complex 
shopkeepers, 28 

Company sales, 88 

Cotton consumption, 429 

Cotton production, 429 

Customer ratings, 447 

Distribution of sales, 592 

Effectiveness of advertising, 590 

Employees and revenue of hotel 
and gaming companies, 
514 

Existing home sales, 92 

Flash drive cycles, 406 

Fortune 500 companies, 191 

Fortune 500 revenues, 52 

High-tech earnings, 27 

Highest-paid tech CEOs, 86 

Hotel room rates, 423, 433, 489, 
596 

Hours spent on calls by 
telemarketing firm, 212 

Incomes of CEOs, 271, 280 

Manufacturing defects, 455 

Mangoes sales, 247 

Meal prices at a resort, 433 

Milk production, 539, 540 

Monthly sales, 584 

Natural gas expenditures, 602 

Net profit for Procter & Gamble, 

136 

Net sales, 543 

New vehicle sales, 528, 529 


Number of calls by telemarketing 
firm, 212 

Office positions, 206 

Office rental rates, 108, 114 

Oil prices, 328 

Online shopping, 98, 304, 342, 344 

President contract, 54 

Printing company departments, 
53 

Product ratings, 467 

Profit and loss analysis, 221 

Purchase value in departmental 
store, 28 

Repair cost for paint damage, 380 

Repeat customers, 382 

Running costs for cars, 447 

Sales for a representative, 214, 
215, 216, 217, 552 

Sales volumes, 181 

Small business websites, 98 

Sorghum yield, 534 

Spinach yield, 541 

Square meters and office sale 
price, 513 

Starting salaries for Standard & 

Sweet potato yield, 602 

Telemarketing and Internet 
fraud, 600 

Toothpaste costs, 587, 591 

Values of farms, 511 

Website costs, 357 

Wheat production, 596 

Yearly commission earned, 136 


Combinatorics 

Access codes, 155, 165, 206 
Arranging letters in a word, 197 
ASCII codes, 35 

Code, 191 

Identification number, 161 
License plates, 155 

Lock box codes, 163 
Passwords, 196, 208 
Registration numbers, 202 
Security code, 196, 205, 207 
Sudoku, 190 

Telephone numbers, 203 


Computers and Technology 

Active users on social 
networking sites, 85 

Air conditioner, 383 

Battery backup, 48 

Byte, 197 

Camera defects, 185 

Cell phone and Internet 
privileges, 303 

Computer ownership, 488 

Consumer electronics, 233 

Customizing a tablet, 163 

Cyberattacks, 317 

Digital device use, 148 

Double charging, 304 


Email, 238 

Email hacking, 312 

Internet use, 248 

Internet service provider, 383 

Inverter batteries, 357 

Laptop, 382 

Laptop repairs, 205, 591 

Life span of home theatre 
systems, 372, 377 

Lifetimes of smartphones, 256 

Mobile defects, 185 

Mobile device repair costs, 337, 

338 

Mobile device use and walking in 

front of a moving vehicle, 

298, 301 

Mobile phone purchasers, 48 

Mouse over touchpad, 29 

Online account passwords, 304, 
313,411 

Online consumers, 233 

Online games playing times, 73 

Phone screen sizes, 87 

Prices for LCD computer 
monitors, 290 

Ride-hailing applications, 52 

Selection committee, 205 

Smartphone ownership, 55 

Smartphone screen locks, 304 

Smartphones, 56, 205, 238, 239, 
322, 357, 411, 412 

Smoke detectors, 397 

Social media, 24, 29, 46, 169, 224, 
226, 248, 342, 344, 441 

Technology ownership, 181 

Teen Instagram use, 243 

Testing smartphones, 59 

Text messages sent, 77, 78, 79 

Time spent checking email, 362 

Wireless devices, 250 

Wireless networks, 227 


Demographics 

Age, 48, 52, 84, 92, 100, 111, 116, 
133, 153, 158, 160, 164, 185, 
202, 433, 504, 552, 554 

Age distribution in California, 
315 

American customs and traditions, 
361 

Amount spent on energy, 597 

Amount spent on pet care, 85 

Annual income by state, 115 

Birthdays, 184, 207 

Births by day of the week, 557 

Book reading by U.S. adults, 157 

Reading days, 423 

Census, 23, 26 

Characteristics of yoga users 
and non-yoga users, 439, 
471,472 

Children per household, 28, 112 


Cost of raising a child, 394 

18- to 22-year old U.S. 

population, 43 

Favorite beverage, 304 

Favorite day of the week, 87 

Favorite season, 87 

Favorite sport, 304 

Hiding purchases from spouse, 300 

Household income, 27, 455, 480, 
483, 597 

Household sizes, 316 

Incomes of home owners in 
Massachusetts, 29 

Intermarriages, 474 

Level of education, 148, 165 

LGBT identification, 348 

Life expectancies, 149 

Magazine subscriptions per 
household, 139 

Migrants, 242 

National identity and birthplace, 
349 

News sources, 228, 345 

Online purchases of eyeglasses, 
489 

Per capita disposable income, 313 

Per capita electric power 
consumption, 310 

Per capita energy consumption, 
280 

Per capita water consumption, 313 

Per capita water footprint, 280, 
293 

Pet ownership, 118, 414 

Physician demographics, 55 

Population of Iowa, 64 

Population of West Ridge 
County, 43, 44, 45 

Populations of parishes of 
Louisiana, 100 

Populations of U.S. cities, 31 

Retirement ages, 75 

Smokers, 167 

Social class self-identification, 
204 

Stay-at-home mothers, 248 

Taking in stray dogs, 414 

Televisions per household, 141 

Top-earning states, 149 

Unemployment rates, 139 

US. population, 23, 26, 251 

US. citizen, 204 

Value of home and lifespan, 504 

Women who are mothers, 233 

Working mothers, 234 

Working students, 413 

Young adults, 474 


Earth Science 

Acid rain, 544 

Air contamination, 28 

Air pollution, 54 

Atmosphere, 115 

Carbon dioxide emissions, 294, 
398 

Carbon monoxide levels, 406 

Classification of elements, 142 


Clear days, 231 

Climate change, 316, 343 

Cloudy days, 231 

Conductivity of river water, 403 

Constellations, 76 

Cyanide presence in drinking 
water, 364 

Days of rain, 211, 213, 215 

Density of elements, 116 

Droughts, 243 

Global climate change, 91 

Growing season temperatures, 
293 

Humidity, 85 

Hurricanes, 221, 361 

Lead levels, 406 

Level of lead in water, 401 

Lightning strikes, 250 

Daily rainfall, 328 

Old Faithful eruptions, 68, 118, 
295, 494, 497, 499, 502, 510, 
511,521 

PH level of river water, 403 

PH level of soil, 576 

Pollution indices, 138 

Precipitation, 34, 243, 293, 448 

Protecting the environment, 414 

Rain, 162, 202 

Rainfall, 447 

Snowfall, 219, 292 

Sodium chloride concentrations 
of seawater, 329 

Sulfur dioxide in the air, 429 

Surface concentration of 
carbonyl sulfide on the 
Indian Ocean, 309, 310 

Temperature, 33, 34, 52, 71, 357, 
448 

Tornadoes, 149, 219 

Water quality, 357 

Weather forecasting, 152, 165, 211 


Economics and Finance 

Account balance, 99 

Allowance, 594 

Annual rate of return for large 
growth mutual funds, 277 

ATM cash withdrawals, 76 

Average of car rental, 428 

Broker records, 59 

Child support payments, 287 

Cloth manufacturer, 243 

Confidence in U.S. economy, 40 

Credit card balance, 99 

Credit card debt, 406, 444, 603 

Credit card purchases, 135 

Credit cards, 215 

Credit scores, 482 

Crude oil imports, 88 

Dow Jones Industrial Average, 
294, 464 

Earnings and dividends, 505 

Family incomes, 53 

Federal income tax, 426 

Fund assets, 528, 529 


INDEX OF APPLICATIONS 17 


GDP and carbon dioxide 
emissions, 493, 496, 501, 
502, 509, 511, 521, 522, 523, 
525, 530, 531 

GDP growth rates, 339 

Going cashless, 555 

Health nonprofit brands, 32 

How the economy is doing, 350 

Income, 141 

Income group of residents, 52 

Incomes of taxpayers, 28 

Individual stock price, 167 

Infrastructure-strengthening 
investments, 52 

Insurance claiming from 
company, 140 

Insurance policies, 382 

Loan application approval, 200 

Maintenance charges, 271 

Mean utility bill, 118, 130 

Money management, 556 

Mutual funds, 29 

Personal income, 589 

Popular investment types, 55 

Rental rates, 144 

Savings account, 33 

Shareholder’s equity, 535 

Simulating the stock market, 168 

Spending on Christmas gifts, 139 

Spending in preparation for 
travel, 113 

Standard & Poor’s 500, 294 

Stock offerings, 526, 529 

Stock price, 328, 542 

Stock risk, 576 

Tax fraud, 347 

Tax holiday, 233 

Tax preparation, 548, 549, 551 

Tax return audits, 251, 413 

US. trade deficits, 101 

Warehouse rent, 406 

Wealth, 120 


Education 

A Grade, 221 

Achievement and school 
location, 566 

ACT composite scores, 272, 311 

ACT English score, 448 

ACT math score, 30, 448 

ACT reading score, 308, 448 

ACT science score, 448 

Ages of college professors, 143 

Ages of enrolled students, 325, 
330 

Ages of high school students, 306 

Alumni contributions, 493, 497, 
499, 510 

Attitudes about safety at schools, 
565 

Bachelor’s degrees, 202 

Books, 221 

Borrowing for college, 567 

Boys in ATAR, 266 

Business degrees, 174, 186 

Business schools, 35 

Chairs in a classroom, 332 


Chances of retention, 48 

Cheating on a test, 243 

Class levels, 98 

Class project, 197 

Class size, 138 

Clear an entrance exam, 242 

College completion, 29 

College costs, 119, 434 

College education, 556 

College graduates, 304, 311, 313, 
349, 475 

College placement rate, 382 

College students and drinking, 
251 

College students and exercise, 
563 

College students with jobs, 268 

College success, 434 

Common Core English Language 
Arts Test scores, 258 

Common Core Mathematics Test 
scores, 258 

Completing an exam, 219 

Continuing education, 566 

Digital content in schools, 177 

Earned degrees conferred, 80, 
206 

Education policy, 52 

Education tax, 186 

Educational attainment, 598 

Engineering degrees, 87 

Enrollment levels, 207 

Exam scores, 131, 356, 440 

Extra classes taken per week, 132 

Extracurricular activities, 372, 
377, 378 

Final grade, 99, 100, 532, 533 

Freshman orientation, 251 

Full-time teaching experience, 
595 

Girls in ATAR, 266 

Grade point averages, 55, 84, 93, 
98, 100, 280, 338, 493, 502, 
536, 585, 591 

Grades, 102, 163 

Grades and media use, 298, 300 

Grammatical errors, 242 

GRE scores, 281 

High school graduation rate, 398 

History class grades, 281 

History course final 
presentations, 207 

Hours students slept, 247, 286 

Hours students studying per 
week, 85 

Hours studying and test scores, 
513 

Idle times, 97 

International mathematics 
literacy test scores, 430 

Leaves, 132 

Lecture course, 203 

Length of a guest lecturer’s talk, 
131 

Library, 219 

LSAT scores, 310 

Math minor, 52 


18 INDEX OF APPLICATIONS 


Mathematics assessment tests, 423 

MCAT scores, 72, 311, 397 

Midterm examination, 29 

Multiple-choice quiz, 224 

Music assessment test scores, 482 

Music students, 184 

Nursing major, 179 

Paying for college education, 243, 
350, 382 

Paying for college expenses with 
a credit card, 290 

PhD stipends, 466 

Poor teaching, 305 

Potential applicants and student 
loan debt, 148 

Psychology major, 184 

Quantitative reasoning scores, 
133 

Reading assessment test scores, 
482 

Residency positions, 173 

Returning a library book, 175 

Room and board expenses, 289 

SAT critical reading scores, 293, 
596 

SAT math scores, 26, 222 

SAT physics scores, 363 

SAT reading and writing score, 
55 

SAT scores, 76, 128, 272, 338, 464, 
489, 536 

School’s admission form, 37 

Science achievement test scores, 
430 

Science assessment tests, 578 

State mathematics test, 452 

Statistics course enrollment, 43 

Statistics course scores, 100, 143 

Student activities and time use, 
138 

Student daily life, 434 

Student employment, 413 

Student housing, 466, 468 

Student loans, 86, A29 

Students who earn bachelor’s 
degrees, 469 

Students paying bills on time, 251 

Students planning to study visual 
and performing arts, 475 

Students in public schools, 204 

Students scoring in examination, 
140 

Students undecided on an 
intended college major, 
475, 483 

Study habits, 47 

Study hours, 119 

Subject scores, 97 

Summer vacation, 29 

Teacher body cameras, 347, 349 

Teaching conference, 184 

Teaching experience, 317 

Teaching load, 119 

Teaching methods, 456, 479 

Teaching styles, 55 

Test scores, 96, 118, 133, 139, 149, 
159, 557, 603 

Testing times, 143 


Time spent on homework, 338 

True/false test, 162 

Tuition costs, 125, 129 

Tuition and fees, 97, 407, 595 

University committee, 198 

US. history assessment tests, 578 

Visiting a library or a 
bookmobile, 52 

Vocabulary assessment tests, 423 

Weight of school bag, 219 

What Americans know about 
science, 24 

Yes or no quiz, 163 


Engineering 

Bolt widths, 431 

Can defects, 185 

Carbine manufacturer, 383 

Carton defects, 185 

Circumference of soccer balls, 
330 

Circumference of tennis balls, 
330 

Defective disks, 199 

Defective DVR, 162 

Defective parts, 153, 235, 242, 317 

Defective units, 199, 206, 244 

Diameter of an engine part, 273 

Diameter of a gear, 273 

Diameters of machine parts, 292 

Fishing line strength, 421 

Glass, 382 

Golf ball manufacturing, 416 

Injection mold, 597 

Juice dispensing machine, 330 

LED lamps, 221, 398 

Life-testing of component, 202 

Lifetimes of diamond-tipped 
cutting tools, 292 

Light bulb manufacturing, 339 

Liquid dispenser, 273 

Liquid volume in cans, 138 

Load-bearing capacities 
of transmission line 
insulators, 292 

Machine part accuracy, 54 

Machine settings, 314 

Manufacturing defect, 244 

Manufacturing plants, 74 

Mean life of fluorescent lamps, 

398 

Mean life of furnaces, 372, 377 

Melting points of industrial 

lubricants, 292 

Microwaves, 382 

Nail length, 273 

Paint can volumes, 294, 330 

Parachute failure rate, 374 

Power failures, 97 

Production of washing machines, 
37 

Solar panels, 28 

Statistical process control, 273 

Tennis ball manufacturing, 339 

Tensile strength, 455, 456 

Testing toothbrushes, 148 

Volume of gasoline, 213 


Entertainment 

Academy Award winners, 134 

Adult contemporary radio 
stations, 141 

Albums by The Beatles, 143 

American roulette, 222 

Attendance at concerts, 219 

Best sellers list, 36 

Book formats, 36 

Broadway shows, 37 

Celebrities as role models, 176 

Concert tours, 37 

Dancing competition, 197 

eBooks, 230 

Entertainment, 455 

Fair bet, 221 

Game show, 162 

Game tournament, 28 

Jukebox, 198 

Lengths of songs, 135 

Lottery, 177, 195, 197, 198, 200, 244 

Monopoly, 170 

Motion Picture Association of 
America ratings, 34 

Movie genres, 32 

Movie ratings, 37, 184 

Movie rental late fees, 175 

Movie times, 36, 220 

Movies watched in a year, 143 

Museum, 382 

Musical dice game minuet, 209 

Number one movies, 97 

Probability of adult watching 
movie, 164 

Raffle, 159, 218, 222 

Rap and hip-hop music, 234 

Reviewing a movie, 603 

Roller coaster heights, 72, 360, 
397, A30 

Roulette wheel, 200 

Song playlist, 197 

Sound speaker, 382 

Television watching, 539, 540 

Top-grossing films, 54 

Type of movie rented by age, 
565, 568 

Types of televised shows, 34 

Vacations, 36, 445 

Video game scores, 54 

Video games on smartphones, 
232 

Violent video games, 369 

Winter vacation, 361 


Food and Nutrition 

Amounts of caffeine in brewed 
coffee, 119 

Artificial sweetener, 39 

Balanced diet, 383 

Caffeine content of soft drinks, 
398 

Caloric and sodium content of 
hot dogs, 514 

Carbohydrates in chicken 
sandwiches, 579 

Carbohydrates in a nutrition 
bar, 433 

Calorie intakes, 75 


Cereal boxes, 363 

Chocolate cookies, 294 

Cholesterol contents of cheese, 
329 

Cholesterol contents of chicken 
sandwiches, 579 

Cocoa consumption, 266 

Coffee consumption, 554 

Cookies, 398 

Corn kernel toxin, 195 

Dietary habits and school 
performance, 54 

Dietary supplement, 47 

Dieting, 47 

Drinking of water per day, 48 

Eating habits, 53 

Eating healthier foods, 48 

Eating at a restaurant, 316 

Fast food, 233, 348 

Fat content in whole milk, 419 

Fiber content, 467 

Food allergies or intolerances, 
198 

Food safety, 570 

Food storage temperature, 26 

Grocery shopping, 233, 284 

Ice cream, 48, 558, 559, 561 

Meal kits, 433 

Meal programs, 234 

Menu selection, 163, 197, 202 

Milk adulteration, 197 

Multivitamins and cognitive 

health, 54 

M&M’s, 248, 552, 553 

Nutrients in breakfast cereals, 
545 

Ordering delivery, 555 

Purchasing food online, 414 

Restaurant ratings, 564, 568 

Restaurant serving times, 431 

Restaurant waiting times, 575 

Salad dressing, 198 

Sodium content of sandwiches, 
478 

Sodium in a soft drink, 428 

Sodium in branded cereals, 329 

Sports drink, 419 

Sugar consumption and cavities, 
538 

Taste test, 75 

Temperature of coffee, 334, 335 

Vitamin D3, 40 

Vitamin D amounts, 357 

Water consumption and weight 
loss, 503 

Weight loss drink, 41 

Weight loss supplement, 480 

Weights of bags of baby carrots, 
281 

Wheat, 528, 529 

Whole-grain foods, 48 


Government and Political 

Science 

Ages of presidents, 74 

Ages of prime ministers, 271 

Ages of Supreme Court justices, 
140 


Ages of voters, 159 

Best president, 176 

Brexit, 186 

Candidate support, 346 

Congress, 184, 189, 311, 348, 349 

Council of Australian 
Governments, 198 

Declaration of Independence, 74 

Defense spending, 350 

Election, 164 

Election polls, 358 

Electoral votes, 86 

Eligible voters, 236 

Fact-checking by media, 52 

Gender profile of Congress, 36 

Global affairs, 361 

Government salaries, 588, 591 

International relations, 365 

Leaders, 97 

Legislator performance ratings, 
462 

Political correctness, 234 

Political efforts, 47 

Political party, 28, 53, 91, 184 

Presidential candidates, 184 

President’s approval ratings, 45 

Presidents’ medical information, 
46 

Registered voter not voting, 165 

Registered voters, 28, 30, 59 

Republican governors, 30 

Supreme Court approval, 482 

Trusting political leaders, 29 

Voter turnout, 528, 529 

Votes for Republicans, 165 

Worst president, 176 


Health and Medicine 

Age and hours slept, 515 

Alcohol and tobacco use, 504 

Allergy drug, 47, 355 

Anterior cruciate ligament 
reconstructive surgery, 172 

Antibiotics, 361, 412 

Appetite suppressant, 460 

Arthritis medication, 476 

Assisted reproductive technology, 
252 

Bacteria vaccine, 50 

Blood Glucose Levels, 86 

Body mass index, 99, 339 

Body mass index and mortality, 
54 

Breast cancer, 50 

Caffeine consumption and heart 
attack risk, 558 

Cancer drug, 473 

Cancer survivors, 230 

Cardiovascular health, 175 

Childhood asthma prevalence, 
294 

Cholesterol levels, 97, 278 

Cholesterol-reducing medication, 
472 

Cigarette content, 541 

Coffee consumption and multiple 
sclerosis, 53 

Coffee and stomach ulcers, 175 


Concussion recovery times, 107, 
108 

Days spent at the hospital, 589 

Delaying medical care, 346 

Dentist office waiting times, 363 

Depression and pregnancy, 53 

Dieting products and weight loss 
services, 421 

Drug and body temperature, 463 

Drug testing, 468, 470, 481 

Drug treatment and nausea, 564, 
565 

DVD featuring the dangers of 
smoking, 42 

Eating dark chocolate and heart 
disease, 426 

Electronic cigarette use, 249 

Emergency room patients, 101 

Epilepsy treatment, 489 

Exercise and cognitive ability, 30 

Exercising, 142 

Eye survey, 187 

Flu shots, 233 

Focused attention in infancy, 29 

Gum for quitting smoking, 42 

Headaches and soft tissue 
massage, 465, 483 

Health care visits, 549 

Hearing loss, 26 

Heart rate and QT interval, 513 

Heart rates, 99 

Heart transplant waiting times, 
578 

Heights and pulse rates, 494 

Herbal medicine testing, 468 

High blood pressure drug, 54 

Hospital beds, 100 

Hours of sleep, 55, 340, 539, 540 

Hypertension drug and sleep 
apnea, 53 

Health hypotheses, 383 

In vitro fertilization, 177 

Influenza vaccine, 29, 42 

Injury recovery, 159 

Inpatients length of stay, 484 

IQ and brain size, 538 

Length of visits at physician 
offices, 594 

Living donor transplants, 243 

Lung cancer, 382 

Marijuana use, 185 

Medication, 27 

Multiple sclerosis drug, 473 

Musculoskeletal injury, 565 

Obesity rates, 30 

Online account with healthcare 
provider, 311 

Opioid addiction, 53 

Pain relievers, 148, 580, 583, 586 

Postponing medical checkups, 
250 

Pregnancy durations, 272 

Prescription medicine expenses, 
63, 65, 66, 67, 68, 69, 70, 94 

Protein and weight gain in 
overeaters, 458 

Reaction times to an auditory 
stimulus, A30 


INDEX OF APPLICATIONS 19 


Red blood cell count, 272, 281 

Red wine consumption and heart 
disease prevention, 136 

Reducing the number of 
cigarettes smoked, 476 

Resting heart rates, 336 

Rotator cuff surgery, 172 

Salmonella contamination of 
ground beef, 374 

Shrimp allergy, 502 

Sleep deprivation, 47, 53 

Sleep and reaction time, 492 

Sleep and student achievement, 
30 

Smoking attitudes, 54 

Smoking and emphysema, 170 

Stem cell research, 45 

Surgery success, 152, 224, 225, 
250, 297 

Surgical treatment, 380 

Systolic blood pressure, 46, 75, 
256, 360, 440 

Testing a drug, 305 

Therapeutic taping and chronic 
tennis elbow, 466 

Time for nutrients to enter the 
bloodstream, 575 

Training heart rates, 287 

Treatment of depression, 97 

Triglyceride levels, 28, 270, 440, 
A30 

Trying to quit smoking, 566 

Vaccinations, 249, 413 

Virus testing, 178 

Vitamin D tablets, 356 

Waiting time to see a family 
doctor, 457 

Water use in hospitals, 293 

Wearable fitness device and 
lowcalorie diet, 371 

Weight and hours slept by 
infants, 140, 530 

Weight and waist, 504, 505 

Weight loss, 248 

Weight loss program, 432 

Well-being index, 589, 591 

Zika virus, 413 


Housing and Construction 

Building a new high school, 295 

Cement, 101 

Heights and stories of buildings, 
139 

House size, 372, 377, 379 

Housing costs, 256 

Indoor temperature at night, 389 

Mean construction costs, 336 

Mean home sales price, 448, 483 

Mean price of new homes, 142 

Predicting house sales, 200 

Privately-owned housing units, 23 

Property inspection, 197 

Sales price of a single-family 
house, 289, 590 

Selling prices of real estate and 
location, 575 

Subdivision development, 192 

Weights of bricks, 35 


Law 

Bar Examination, 203 

Custodial sentences, 98 

Fighting local crime, 234 

GMO labeling legislation, 
311, 348 

Hourly billing rate for law firm 
partners, 29 

Investigating crimes, 347 

Jury selection, 173, 195, 197, 435 

Legal system, 374 

Nonviolent protest, 350 

Parking infractions, 295 

Police body cameras, 347 

Police response times, 420 

Rezoning a portion of a town, 
358 

Scores for California Peace 
Officer Standards and 
Training test, 277 

Sprinkling ban, 164 

Supreme Court, 235 

Terrorism, 347 

Terrorism convictions, 383 

Tickets written by a police 
officer, 247 


Miscellaneous 

Acid strengths, 74 

Affording basic necessities, 383 

Age and vocabulary, 504, 505 

Animal species and people who 
own more than two cars in 
a region, 502 

Archaeology club members, 197 

Asylum decisions, 200 

Ban on skateboarding in parks, 
55 

Beaches, 193 

Birthday problem, 178 

Blood donations, 29 

Blood donors, 179, 197 

Board positions, 191, 194, 206, 316 

Bottles, 356 

Bottle heights, 357 

Bag of white and black balls, 203 

Campus security response times, 
72 

Charity work, 187 

Chlorine level in a pool, 421 

Civilians, 29 

Club officers, 198 

Cloth cutter, 294 

Coin and die experiment, 153, 
170, 171, 175, 558 

Coin and spinner experiment, 
161 

Coin toss simulation, 485 

Colors of fabrics, 35 

Committee makeup, 196 

Conservation, 295 

Conveyance Spending, 73 

Cooking area of gas grill, 540 

Daylight Savings Time, 249 

Debit card personal 
identification numbers, 54 

Delivery errors, 382 

Departmental store registers, 247 


20 INDEX OF APPLICATIONS 


Die, coin, and playing card 
experiment, 163 

Die and spinner experiment, 165 

Emergency incidents, 245 

Energy situation of the United 
States, 362 

Experimental group, 197 

Fake news, 148 

Fear of terrorist attacks, 431 

Fire accidents per year, 328 

Fire history, 115 

Floral arrangements, 197, 317 

Grocery store checkout counter, 
253 

Gym schedule, 271 

Having a gun in the home, 248 

Health club costs, 430 

Holding for a telephone call, 253 

Hurricane relief efforts, 47 

Identity theft, 223 

Intriguing events, 157 

ISIS, 30 

Life on other planets, 234 

Life spans of bearings, 134 

Lucky toss, 233 

Magazine stories, 205 

Mainstream media, 347 

Marital status, 35 

Mean male hat size, 432 

Media conduct, 186 

Migratory birds, 383 

Middle names, 163 

Months of the year, 202 

Muffins, 294 

Necklaces, 197 

New Year’s resolution, 347, 349 

Pages per chapter of a novel, 36 

Parade floats, 196 

Phone numbers, 32 

Plans after high school, 55 

Police officer badge numbers, 54 

Random number selection, 162, 
163, 164 

Reaction survey, 53 

Regnal years, 74 

Results of a survey, 88 

Reviewer ratings, 221 

Rolling a die, 59, 96, 101, 153, 156, 
160, 163, 164, 167, 179, 180, 
181, 185, 188, 202, 203, 250 

Rooms reserved at a hotel, 138 

Selecting a jelly bean, 203 

Selecting a deck card, 233 

Selecting a marble, 224 

Selecting a numbered ball, 175 

Selecting a playing card, 156, 163, 
167, 169, 170, 171, 175, 180, 
181, 185, 195, 199, 202, 204, 
223, 225 

Selecting a professional person, 
203 

Service at electronics store, 55 

Shopping times, 269 

Social Security numbers, 32 

Speaking English, 350 

Speed of sound, 505 

Spinning a spinner, 204 


Student donations at a food 
drive, 28 

Survey, 383 

Time spent doing activities, 39 

Tossing a coin, 59, 158, 162, 163, 
170, 200, 202, 248, 373 

Transferring a telephone call, 420 

Transgender bathroom policy, 
348 

Travel concerns, 559, 562 

Typographical errors, 250 

Vending machine, 281 

Veterinarian visits, 454 

Violent crimes by year, 556 

Volunteering or donating money, 
428 

Wait times, 133, 357 

Waiting for an elevator, 253 

Weights of vacuum cleaners, 587, 
591 

Wildland fires, 538 

Winning a prize, 167, 243 

World happiness, 271, 280 

Writing a guarantee, 281 

Yoga classes, 88, 430 


Mortality 

Deaths caused by falling out of a 
fishing boat and marriage 
rate, 504 

Fatal pedestrian motor vehicle 
collisions, 596 

Homicide rates and ice cream 
sales, 504 

Homicides by county, 555 

Knowing a murder victim, 176 

Leading causes of death, 81 

Lightning fatalities, 127 

Living to age 100, 234 

Motor vehicle fatalities, 36, 568 

Shark deaths, 249 

Tornado deaths, 249 


Motor Vehicles and 

Transportation 

Acceleration times of sedans, 
361, 397 

Accidents at an intersection, 239, 
240 

Ages of vehicles, 529 

Aggressive driving, 301 

Airplane defects, 141 

Alcohol-impaired driving, 303, 
348, 349 

Alcohol-related accidents, 567 

Amount of fuel wasted, 124, 126, 
127 

Automobile parts, 352 

Average speed of vehicles, 268 

Bicycle tire pressure, 317 

Black carbon emissions from 
cars, 309, 310 

Braking distances, 277, 310, 447, 
506 

Canadian border ports of entry, 97 

Car, 164, 220 

Car battery life spans, 317, 372, 
377, 379 

Car inspections, 58 


Car rental rates, 432 

Car speeds, 100 

Carpooling, 229, 599 

Cars in a parking lot, 219 

Commuting distances, 72, 337, 338 

Commuting times, 328, 337, 446 

Cost per mile for automobiles, 
588 

Crash tests, 547 

Days cars sat on dealership lot, 
335 

DMV wait times, 35, 404 

Driverless cars, 54 

Driving ranges of ethanol flexible 
fuel vehicles, 98 

Drunk driving, 85 

Drunk-driving accidents, 577 

Engine displacement and fuel 
efficiency, 539, 540 

Flight arrivals, 178 

Flight departures, 36, 178 

Flight prices, 98, 429 

Fuel consumption, 578 

Fuel costs of hybrid electric cars, 
98 

Fuel economy, 102, 141, 428, 478, 
540, 541, 585 

Gas mileage, 339, 370, 383, 393, 
407, 423, 488 

Helmets, 423 

Hindenburg airship, 29 

Jet fuel use, 250 

Least popular drivers, 85 

Leisure trips, 174 

Life spans of tires, 265 

Mean age of used cars sold, 402 

Mean driving cost per mile, 453 

Mean price of used cars sold, 402 

Mean vehicle speed, 118, 130 

Mileages of service vehicles, 96 

Miles driven per day, 288, 289 

Motor vehicle thefts, 83, 85 

Motorcycle fuel efficiency, 131 

Motorcycle helmet use, 481 

Music and driving habits, 46 

New highway, 193 

Occupancy of vehicles that travel 
across a bridge, 240 

Oil change time, 372, 377, 378 

Oil tankers, 243 

Own vehicle, 48 

Parking infractions and fines, 145 

Parking ticket, 175 

Pickup trucks, 176 

Pilot test, 242 

Public transportation, 229 

Purchasing a hybrid vehicle, 563 

Purchasing a new car, 154 

Racing car engine horsepowers, 
54 

Reservations, 243 

Response times of test drive, 74 

Road accidents per day, 76 

Road accidents on weekday, 247 

Safety driving classes and 
accidents, 503 

Sale price of a bike, 382 

Seat belt use, 471, 474 


Selecting vehicles, 180 

Simulations with dummies, 40 

Space shuttle flight durations, 
140, 356 

Space travel, 414 

Speed and car accidents, 170 

Speed of a rocket, 213 

Speeds of trains, 35 

Speeds of automobiles, 388 

Speeds of powerboats, 446 

Tire life span, 281 

Top speeds of sports cars, 84 

Tourism, 87, 557 

Towing capacity, 141 

Traffic flow rate, 382 

Travel time, 407 

Truck weight, 247 

Type of car owned by gender, 
564, 568, 595 

Uninsured drivers, 249 

Used cars, 535 

Vehicle costs, 86 

Vehicle prices, 433 

Vehicle ratings, 432 

Vehicle sales, 590 

Vehicles and crashes, 567 

Waiting times to turn at an 
intersection, 291 

Weights of packages on a 
delivery truck, 98 


Psychology 

Attention deficit hyperactivity 
disorder, 148 

Child behavior, 476 

Depression and stress, 102 

Gambler’s fallacy, 200 

IQ levels, 52 

IQ score, 131, 312, 319, 446 

Mobile technology and 
depression and anxiety, 
175 

Mood and a team’s win or loss, 
153 

Obsessive-compulsive disorder, 
568 

Personality inventory test, 214, 
216, 217 

Psychological screening test, 440 

Psychology experiment, 191 

Verbal memory test, 451 


Sports 

Adult participation in sports, 305 

Ages and heights of women’s 
US. Olympic swimming 
team, 120 

Ages of professional athletes, 588 

Ages of Tour de France winners, 
134 

Ages and weights of men’s U.S. 
Olympic wrestling team, 
98 

American League home run 
totals, 33 

At-bats, 237 

Athlete as an occupation, 361 

Athlete use of performance 
enhancing drugs, 151 


Athletic scholarships, 249 

Baseball, 196 

Baseball umpires, 200 

Batting averages, 270, 465 

Bench press weights, 446 

Big 12 collegiate athletic 
conference, 190 

Blood pressure levels of athletes, 
35 

Body temperatures of athletes, 33 

Boston Marathon Open Division 
champions, 362, 397 

Boston Marathon runners’ birth 
years, 54 

Bowling speeds, 74 

Bowling tournament, 206 

Caffeine ingestion and freestyle 
sprints, 465 

Chess, 37 

China at olympics, 35 

College football touchdowns, 
166, 357 

Cricket, 221, 382 

Cricket player for school team, 
167 

Cycling race, 205 

Derby, 383, 467 

Distance a baseball travels, 219 

Distance for holes of a golf 
course, 102 

Diving scores, 37 

Efficiency of a pit crew, 388 

Final standings, National 
Basketball Association, 32 

FIFA world cup, 86 

Football kick distances, 336 

Football and negative moral 
values, 227 

Football team winning, 164 

Footrace, 197 

40-yard dash times, 462 

Free throws, 223, 250 

Gambling game, 248 

Goal production, 249 

Goals allowed and points earned 
in the National Hockey 
League, 527, 529 

Goals and wins in the English 
Premier League, 513 

Golf driving distances, 457, 578 

Golf students, 594 

Heart rates of athletes, 33 

Heights of basketball players, 
87, A29 

Heights of volleyball players, 99 

Heights and weights of a 
basketball team, 114 

Hits per game, 247 

Horse race, 198, 248 

Hourly fees, 86, 87 

Indianapolis 500, 191 

Indian premier league, 339 


Lacrosse team, 196 

Major League Baseball 

attendance, 491, 494, 497, 

501, 502, 510 

Major League Baseball salaries, 

491, 494, 497, 501, 502, 510 

Marathon training, 341 

Maximal strength and jump 

height, 505, 506 

Maximal strength and sprint 

performance, 505, 506 

Medals in olympic games, 37 

National Football League 

retirees and arthritis, 302 

National Football League 

rookies, 183 

New York City Marathon, 86 

New York Yankees’ World Series 

victories, 33 

Number of athletes and medals 

won, 513 

Numbers on sports jerseys, 32 

Participating in marathons, 203 

Pass attempts and passing yards, 
538 

Pass completions, 242, 467, 468 

Passing yards for college football 
quarterbacks, 480 

Personal fouls per game, 316 

Players selection, 205 

Playing golf, 204 

Points scored by Montreal 
Canadiens, 149 

Recovering from a football head 
injury, 363 

Regular season wins for Major 
League Baseball teams, 142 

Runs scored, 85 

Runs scored by the Chicago 
Cubs, 118 

Skiing, 196 

Soccer goals, 310 

Speed of soccer goals, 219 

Sports-related head injuries, 31 

Stolen bases for Chicago Cubs, 
149 

Stretching and injury, 564, 565 

Strokes per hole, 244 

Student-athletes, 26, 219, 320, 322, 
323, 324, 326 

Super Bowl points scored, 61, 64, 
65, 67, 68, 69, 70, 78, 79, 89, 
90, 91, 94, 124, 126, 127 128 

10K race, 207, 457 

Trophies, 196 

Vertical jump heights, 461 

Vertical jumps of college 
basketball players, 139 

Weightlifting, 338 

Weights of high school football 
players, 141 


INDEX OF APPLICATIONS 21 


Winning times for men’s and 
women’s 100-meter run, 
602 

Women gamers, 233 


Work 

Actuary salaries, 578 

Ages and career goals, 566 

Architect salaries, 120, 397, 449 

Changing jobs, 99, 413 

Chemical engineer salaries, 293, 
362 

Clinical pharmacist salaries, 293, 
356 

Company employment, 44 

Construction worker salaries, 142 

Courier deliveries, 407 

Customer care wait times, 406 

Delivery hours, 407 

Earnings by educational 
attainment, 81, 452 

Earnings of full-time workers, 
432 

Earnings of men and women, 
527, 529 

Editors earnings, 338 

Electrical engineer salaries, 148 

Employee benefits, 52 

Employee committee, 199 

Employee strike, 159 

Employee training and accidents, 
492 

Employees’ ages, 55 

Employees in foreign country, 29 

Employees’ salaries, 26, 72, 121, 
222,531,533 

Employment agency, 177 

Employment status and 
educational attainment, 
473, 569 

Engineer salaries and ages, 530 

Entry-level paralegal salaries, 478 

Fast-food employees, 55 

Forensic science technician 
salaries, 444 

Graphic design specialist salaries, 
423 

Home care physical therapist 
salaries, 406 

Hourly wages, 101 

Hours worked, 132 

Important jobs, 350 

Job growth, 32 

Late for work, 235 

Law firm salaries, 88 

Leadership, 25 

Length of employment and 
salary, 82 

Librarian and library science 
teacher salaries, 543 

Life insurance underwriter 
salaries, 311, 339 


Likeliness of being laid off, 431 

Locksmith salaries, 489 

Marketing account executive 

salaries, 148 

Mechanical engineer salaries, 393 

Medical care benefits, 248 

Minimum wage, 305 

MRI technologist salaries, 311, 

339 

Numbers of manufacturing 

employees, 123 

Nursing career, 24 

Nursing supervisor salaries, 423 

Overtime, 310 

Paid maternity leave, 433 

Paycheck errors, 244 

Primary reason for career choice, 
40 

Product engineer salaries, 406 

Public relations manager salaries, 
578 

Public school teacher salaries, 
443, 

Registered nurse salaries, 515 

Respiratory therapy technician 
hourly wages, 599 

Salaries, 97 

Salary offers, 116, 117 

Sample executive, 48 

Security officer applicants, 382 

Sick days used by employees, 101 

Software engineer salaries, 449 

Starting salaries, 104, 105, 106 

STEM employment and mean 
wage, 527, 529 

Stressful jobs, 25, 26 

Substantial increments, 30 

Teacher earnings, 338 

Teacher salaries, 542, 601 

Time wasted at work, 584 

Training program, 199 

Travel time to work, 71, 84, 98 

Unemployment, 25 

Unfilled job openings, 54 

US. workforce, 475 

Video game developer careers, 52 

Wages by metropolitan area, 598 

Wages for employees, 140 

Warehouse workers, 199 

Where people work and 

educational attainment, 560 

Workdays missed due to illness 
or injury, 389 

Workers by industry, 166 

Workplace cleanliness, 234 

Workplace drug testing, 233 

Workplace fraud, 148 

Years of service, 278 

Years of driving, 75 


CHAPTER T 


An Overview of Statistics 


Data Classification 


Case Study 


Data Collection and 
Experimental Design 


Activity 

Uses and Abuses 

Real Statistics—Real Decisions 
History of Statistics—Timeline 
Technology 


For the first 10 months of 2016, construction completions of privately-owned housing 
units in the U.S. was greatest in the south. 


J Where You ve Been 


You are already familiar with many of the practices of 
statistics, such as taking surveys, collecting data, and 
describing populations. What you may not know is that 
collecting accurate statistical data is often difficult and 
costly. Consider, for instance, the monumental task of 
counting and describing the entire population of the 


Ly, Where You re Going 


United States. If you were in charge of such a census, how 
would you do it? How would you ensure that your results 
are accurate? These and many more concerns are the 
responsibility of the United States Census Bureau, which 
conducts the census every decade. 


In Chapter 1, you will be introduced to the basic concepts 
and goals of statistics. For instance, statistics were used to 
construct the figures below, which show the numbers, by 
region in the U.S., of construction completions of privately- 
owned housing units for October of 2016 and for the 
first 10 months of 2016, as numbers in thousands and as 
percents of the total. 

For the 2010 Census, the Census Bureau sent short 
forms to every household. Short forms ask all members of 


Housing Units Completed 
in the U.S. (October 2016) 
Z 600 
q —J 
$ 500 | 
g 
g 400 
& 300 
3 200 — 
oO 
2 100 —s 
4 —T— of —T > 
a ae — 
ee F 
Housing Units 
Completed in the U.S. 
(January—October 2016) 
A 
| 6000 
g 
& 5000 
=] 
2 4000 
& 3000 
3B 2000 nar 
g | 
E 1000 = | || | 
| rs a 
S & s & 
roa Se Ss < 


every household such things as their gender, age, race, and 
ethnicity. Previously, a long form, which covered additional 
topics, was sent to about 17% of the population. But for 
the first time since 1940, the long form was replaced by the 
American Community Survey, which surveys more than 
3.5 million households a year throughout the decade. These 
households form a sample. In this course, you will learn 
how the data collected from a sample are used to infer 
characteristics about the entire population. 


Housing Units Completed 
in the U.S. (October 2016) 


Northeast 
74% 


Housing Units 
Completed in the U.S. 
(January—October 2016) 


Northeast 


23 


24 CHAPTER 1 Introduction to Statistics 


What You Should Learn 


» A definition of statistics 


» How to distinguish between a 
population and a sample and 
between a parameter and a 
statistic 

~ How to distinguish between 
descriptive statistics and 
inferential statistics 


A Definition of Statistics ® Data Sets ™ Branches of Statistics 


A Definition of Statistics 


Almost every day you are exposed to statistics. For instance, consider the next 
two statements. 


e According to a survey, more than 7 in 10 Americans say a nursing career is a 
prestigious occupation. (Source: The Harris Poll) 


e “Social media consumes kids today as well, as more score their first social 
media accounts at an average age of 11.4 years old.” (Source: Influence Central’s 
2016 Digital Trends Study) 


By learning the concepts in this text, you will gain the tools to become an 
informed consumer, understand statistical studies, conduct statistical research, 
and sharpen your critical thinking skills. 

Many statistics are presented graphically. For instance, consider the figure 
shown below. 


‘Source: Pew Research Center 


The information in the figure is based on the collection of data. In this instance, 
the data are based on the results of a science quiz given to 3278 U.S. adults. 


DEFINITION 


Data consist of information coming from observations, counts, measurements, 
or responses. 


The use of statistics dates back to census taking in ancient Babylonia, Egypt, 
and later in the Roman Empire, when data were collected about matters concerning 
the state, such as births and deaths. In fact, the word statistics is derived from the 
Latin word status, meaning “state.” The modern practice of statistics involves more 
than counting births and deaths, as you can see in the next definition. 


DEFINITION 


Statistics is the science of collecting, organizing, analyzing, and interpreting 


data in order to make decisions. 


* Study Tip 

A census consists of data 

from an entire population. 

But, unless a population 

is small, it is usually 

impractical to obtain all the 

population data. In most 
studies, information must be 
obtained from a random sample. 


SECTION 1.1 An Overview of Statistics 25 


Data Sets 


There are two types of data sets you will use when studying statistics. These data 
sets are called populations and samples. 


DEFINITION 


A population is the collection of a/l outcomes, responses, measurements, or 


counts that are of interest. A sample is a subset, or part, of a population. 


A sample is used to gain information about a population. For instance, to 
estimate the unemployment rate for the population of the United States, the 
U.S. Bureau of Labor uses a sample of about 60,000 households. 

A sample should be representative of a population so that sample data 
can be used to draw conclusions about that population. Sample data must be 
collected using an appropriate method, such as random sampling. When sample 
data are collected using an inappropriate method, the data cannot be used to 
draw conclusions about the population. (You will learn more about random 
sampling and data collection in Section 1.3.) 


Identifying Data Sets 


In a survey, 834 employees in the United States were asked whether they 
thought their jobs were highly stressful. Of the 834 respondents, 517 said yes. 
Identify the population and the sample. Describe the sample data set. (Source: 
CareerCast Job Stress Report) 


SOLUTION 


The population consists of the responses of all employees in the United States. 
The sample consists of the responses of the 834 employees in the survey. In 
the Venn diagram below, notice that the sample is a subset of the responses 
of all employees in the United States. Also, the sample data set consists of 
517 people who said yes and 317 who said no. 


Responses of All Employees (population) 


Responses of employees 
in survey (sample) 


Responses of employees 
not in the survey 


TRY IT YOURSELF 1 


In a survey of 1501 ninth to twelfth graders in the United States, 1215 said 
“leaders today are more concerned with their own agenda than with achieving 
the overall goals of the organization they serve.” Identify the population and the 
sample. Describe the sample data set. (Source: National 4-H Council) 

Answer: Page A31 


Whether a data set is a population or a sample usually depends on the 
context of the real-life situation. For instance, in Example 1, the population is 
the set of responses of all employees in the United States. Depending on the 
purpose of the survey, the population could have been the set of responses of all 
employees who live in California or who work in the healthcare industry. 


26 CHAPTER 1 Introduction to Statistics 


Two important terms that are used throughout this course are parameter 
and statistic. 


DEFINITION 
Study Tip 
To remember the 
terms parameter and 
statistic, try using the mnemonic 
device of matching the first letters 
in population parameter and the first 
letters in sample statistic. 


A parameter is a numerical description of a population characteristic. 


A statistic is a numerical description of a sample characteristic. 


It is important to note that a sample statistic can differ from sample to 
sample, whereas a population parameter is constant for a population. For 
instance, consider the survey in Example 1. The results showed that 517 of 834 
employees surveyed think their jobs are highly stressful. Another sample may 
have a different number of employees that say their jobs are highly stressful. For 
the population, however, the number of employees who think that their jobs are 
highly stressful does not change. 


grees 


ey =) Picturing 
the World 


How accurate is the count of 

the U.S. population taken each 
decade by the Census Bureau? 
According to estimates, the net 
undercount of the U.S. population 
by the 1940 census was 5.4%. 
The accuracy of the census has 
improved greatly since then. The 
net undercount in the 2010 census 
was -0.01%. (This means that the 
2010 census overcounted the U.S. 
population by 0.01%, which is 
about 36,000 people.) (Source: U.S. 
Census Bureau) 


Distinguishing Between a Parameter and a Statistic 
Determine whether each number describes a population parameter or a 
sample statistic. Explain your reasoning. 


1. A survey of several hundred collegiate student-athletes in the United States 
found that, during the season of their sport, the average time spent on 
athletics by student-athletes is 50 hours per week. (Source: Penn Schoen Berland) 


2. The freshman class at a university has an average SAT math score of 514. 


3. In a random check of several hundred retail stores, the Food and Drug 
Administration found that 34% of the stores were not storing fish at the 
proper temperature. 


SOLUTION 


1. Because the average of 50 hours per week is based on a subset of the 
population, it is a sample statistic. 


U.S. Census 
Net Undercount 


2. Because the average SAT math score of 514 is based on the entire freshman 
class, it is a population parameter. 


3. Because 34% is based on a subset of the population, it is a sample statistic. 


5.4% 


TRY IT YOURSELF 2 


Determine whether each number describes a population parameter or a 
sample statistic. Explain your reasoning. 


a. Last year, a small company spent a total of $5,150,694 on employees’ salaries. 
b. In the United States, a survey of a few thousand adults with hearing loss found 
that 43% have difficulty remembering conversations. (Source: The Harris Poll) 
Answer: Page A31 


Net percent of population undercount 


1940 1960 1980 2000 

Year In this course, you will see how the use of statistics can help you make 
informed decisions that affect your life. Consider the census that the U.S. 
government takes every decade. When taking the census, the Census Bureau 
attempts to contact everyone living in the United States. Although it is impossible 
to count everyone, it is important that the census be as accurate as it can be 
because public officials make many decisions based on the census information. 
Data collected in the census will determine how to assign congressional seats and 
how to distribute public funds. 


What are some difficulties in 
collecting population data? 


Sa = | 
Not Online 

US. adults who do not use the Internet 

by household income 


23% 


6% 


fe 3% 
aaa 

Less than $30,000 to $50,000 to $75,000 

$30,000 $49,999 $74,999 or more 


Household income 


Study Tip 


Throughout this course you 
will see applications of 
both branches of statistics. 
A major theme in this 
course will be how to use 
sample statistics to make 
inferences about unknown population 
parameters. 


SECTION 1.1. An Overview of Statistics 27 


Branches of Statistics 


The study of statistics has two major branches: descriptive statistics and 
inferential statistics. 


DEFINITION 


Descriptive statistics is the branch of statistics that involves the organization, 
summarization, and display of data. 


Inferential statistics is the branch of statistics that involves using a sample to 


draw conclusions about a population. A basic tool in the study of inferential 
statistics is probability. (You will learn more about probability in Chapter 3.) 


Descriptive and Inferential Statistics 


For each study, identify the population and the sample. Then determine which 
part of the study represents the descriptive branch of statistics. What conclusions 
might be drawn from the study using inferential statistics? 


1. A study of 2560 U.S. adults found that of adults not using the Internet, 23% 
are from households earning less than $30,000 annually, as shown in the 
figure at the left. (Source: Pew Research Center) 


2. A study of 300 Wall Street analysts found that the percentage who incorrectly 
forecasted high-tech earnings in a recent year was 44%. (Adapted from 
Bloomberg News) 


SOLUTION 


1. The population consists of the responses of all U.S. adults, and the sample 
consists of the responses of the 2560 U.S. adults in the study. The part of 
this study that represents the descriptive branch of statistics involves the 
statement “23% [of U.S. adults not using the Internet] are from households 
earning less than $30,000 annually.” Also, the figure represents the 
descriptive branch of statistics. A possible inference drawn from the study 
is that lower-income households cannot afford access to the Internet. 


2. The population consists of the high-tech earnings forecasts of all Wall Street 
analysts, and the sample consists of the forecasts of the 300 Wall Street 
analysts in the study. The part of this study that represents the descriptive 
branch of statistics involves the statement “the percentage [of Wall Street 
analysts] who incorrectly forecasted high-tech earnings in a recent year was 
44%.” A possible inference drawn from the study is that the stock market is 
difficult to forecast, even for professionals. 


TRY IT YOURSELF 3 


A study of 1000 U.S. adults found that when they have a question about 
their medication, three out of four adults will consult with their physician 
or pharmacist and only 8% visit a medication-specific website. (Source: Finn 
Futures™ Health poll) 


a. Identify the population and the sample. 
b. Determine which part of the study represents the descriptive branch of 
statistics. 
c. What conclusions might be drawn from the study using inferential statistics? 
Answer: Page A31 


28 CHAPTER 1 Introduction to Statistics 


1.1 EXERCISES reed Ss 


Building Basic Skills and Vocabulary 


1. How is a sample related to a population? 


2. Why is a sample used more often than a population? 
3. What is the difference between a parameter and a statistic? 


4. What are the two main branches of statistics? 


True or False? Jn Exercises 5-10, determine whether the statement is true or 
false. If it is false, rewrite it as a true statement. 


5. A statistic is a numerical description of a population characteristic. 
6. A sample is a subset of a population. 
7. It is impossible to obtain all the census data about the U.S. population. 


8. Inferential statistics involves using a population to draw a conclusion about 
a corresponding sample. 


9. A population is the collection of some outcomes, responses, measurements, 
or counts that are of interest. 


10. A sample statistic will not change from sample to sample. 


Classifying a Data Set Jn Exercises 11-20, determine whether the data set is 
a population or a sample. Explain your reasoning. 


11. A survey of 95 shopkeepers in a commercial complex with 550 shopkeepers 


12. The amount of energy collected from every solar panel on a photovoltaic 
power plant 


13. The height of each athlete participating in the Summer Olympics 

14. The value of purchase by every sixth person entering a departmental store 
15. The triglyceride levels of 10 patients in a clinic with 50 patients 

16. The number of children in 25 households out of 75 households in a neighborhood 
17. The final score of each gamer in a tournament 

18. The ages at which all the presidents of a country were elected 

19. The incomes of the top 10 taxpayers of a country 


20. The air contamination levels at 20 locations near a factory 


Graphical Analysis Jn Exercises 21-24, use the Venn diagram to identify the 
population and the sample. 


21. ‘Parties of Registered Voters 22. Student Donations at a Food Drive 


Parties of registered 
voters who respond 
to a survey 


Student donations 
of canned goods 


Parties of registered voters Student donations 
who do not respond to a survey of other food items 


23. 


SECTION 1.1 An Overview of Statistics 29 


Ages of Adults in the United 24. Incomes of Home 
States Who Own Automobiles Owners in Massachusetts 


Incomes of 
home owners in 
Massachusetts 
with mortgages 


Ages of adults 
in the U.S. who 
own Honda 
automobiles 


Ages of adults in the U.S. who Incomes of home owners in 
own automobiles made by a Massachusetts without 
company other than Honda mortgages 


Using and Interpreting Concepts 


Identifying Data Sets Jn Exercises 25-34, identify the population and the 
sample. Describe the sample data set. 


25. 


26. 


27. 


28. 


29. 


30. 


31. 


32. 


33. 


34. 


A survey of 1020 U.S. adults found that 42% trust their political leaders. 
(Source: Gallup) 


A study of 203 infants was conducted to find a link between fetal tobacco 
exposure and focused attention in infancy. (Source: Infant Behavior and 
Development) 


A survey of 3301 U.S. adults found that 39% received an influenza vaccine 
for a recent flu season. (Source: U.S. Centers for Disease Control and Prevention) 


A survey of 1500 employees worldwide found that 62% of the respondents 
working in a foreign country settle there. 


A survey of 159 U.S. law firms found that the average hourly billing rate for 
partners was $604. (Source: The National Law Journal) 


A survey of 328 children in a city in Belgium found that 86% planned to visit 
their grandparents during the summer vacation. 


Of the 112.5 million blood donations collected globally, approximately 50% 
are collected from high-income countries. (Source: World Health Organization) 


A survey of 1468 laptop users found that 81% preferred the use of mouse 
over touchpad. 


To gather information about the best mutual funds listed on a recognized 
stock exchange website, a researcher collects data about 134 of the 1000 
mutual funds. 


A survey of 1060 parents of 13- to 17-year-olds found that 636 of the 
1060 parents have checked their teen’s social media profile. (Source: Pew 
Research Center) 


Distinguishing Between a Parameter and a Statistic Jn Exercises 
35-42, determine whether the number describes a population parameter or a 
sample statistic. Explain your reasoning. 


35. 


36. 


37. 


Forty out of a high school’s 500 students who took the midterm examination 
received a C grade. 


A survey of 1058 college board members found that 56.3% think that college 
completion is a major priority or the most important priority for their 
board. (Source: Association of Governing Boards of Universities and Colleges) 


Out of the 40 million casualties in the UK during World War II, two 
million were reported to be civilians. 


30 


CHAPTER 1 


Introduction to Statistics 


38 


39. 


40. 


41. 


42. 


43. 


In January 2016, 62% of the governors of the 50 states in the United States 
were Republicans. (Source: National Governors Association) 


Employee records show that all the employees in an organization have 
received substantial increments over their joining salaries. 


In a survey of 650 teachers, 16% reported that there have been instances of 
bullying in their class. 


A survey of 2008 U.S. adults found that 80% think that the militant group 
known as ISIS is a major threat to the well-being of the United States. 
(Source: Pew Research Center) 


In a recent year, the average math score on the ACT for all graduates 
was 20.6. (Source: ACT, Inc.) 


Descriptive and Inferential Statistics Which part of the survey described in 
Exercise 31 represents the descriptive branch of statistics? What conclusions 
might be drawn from the survey using inferential statistics? 


. Descriptive and Inferential Statistics Which part of the survey described in 


Exercise 32 represents the descriptive branch of statistics? What conclusions 
might be drawn from the survey using inferential statistics? 


Extending Concepts 


45. 


46. 


47. 


48. 


49. 


Identifying Data Sets in Articles Find an article that describes a survey. 

(a) Identify the sample used in the survey. 

(b) What is the population? 

(c) Make an inference about the population based on the results of the 
survey. 

Writing Write an essay about the importance of statistics for one of the 

following. 

e A study on the effectiveness of a new drug 

e An analysis of a manufacturing process 

e Drawing conclusions about voter opinions using surveys 

Exercise and Cognitive Ability A study of 876 senior citizens shows that 

participants who exercise regularly exhibit less of a decline in cognitive 

ability than those who barely exercise at all. From this study, a researcher 


infers that your cognitive ability increases the more your exercise. What is 
wrong with this type of reasoning? (Source: Neurology) 


Increase in Obesity Rates A study shows that the obesity rate among 
adolescents has steadily increased since 1988. From this study, a researcher 
infers that this trend will continue in future years. What is wrong with this 
type of reasoning? (Source: Journal of the American Medical Association) 


Sleep and Student Achievement A study shows the closer that participants 
were to an optimal sleep duration target, the better they performed on a 
standardized test. (Source: Eastern Economics Journal) 


(a) Identify the sample used in the study. 
(b) What is the population? 
(c) Which part of the study represents the descriptive branch of statistics? 


(d) Make an inference about the population based on the results of the 
study. 


1.2 


What You Should Learn 


» How to distinguish 
between qualitative data 
and quantitative data 


» How to classify data with 
respect to the four levels of 
measurement: nominal, ordinal, 
interval, and ratio 


City Population 
Baltimore, MD 621,849 
Chicago, IL 2,720,546 
Glendale, AZ 240,126 
Miami, FL 441,003 
Portland, OR 632,309 


San Francisco, CA 864,816 


SECTION 1.2. Data Classification 31 


Types of Data ® Levels of Measurement 


Types of Data 


When conducting a study, it is important to know the kind of data involved. The 
type of data you are working with will determine which statistical procedures can 
be used. In this section, you will learn how to classify data by type and by level 
of measurement. Data sets can consist of two types of data: qualitative data and 
quantitative data. 


DEFINITION 


Qualitative data consist of attributes, labels, or nonnumerical entries. 


Quantitative data consist of numbers that are measurements or counts. 


Classifying Data by Type 

The table shows sports-related head injuries treated in U.S. emergency rooms 
during a recent five-year span for several sports. Which data are qualitative 
data and which are quantitative data? Explain your reasoning. (Source: BMC 
Emergency Medicine) 


Sports-Related Head Injuries 
Treated in U.S. Emergency Rooms 


Sport Head injuries treated 
Basketball 131,930 
Baseball 83,522 
Football 220,258 
Gymnastics 33,265 
Hockey 41,450 
Soccer 98,710 
Softball 41,216 
Swimming 44,815 
Volleyball 13,848 


SOLUTION 


The information shown in the table can be separated into two data sets. One 
data set contains the names of sports, and the other contains the numbers 
of head injuries treated. The names are nonnumerical entries, so these are 
qualitative data. The numbers of head injuries treated are numerical entries, 
so these are quantitative data. 


TRY IT YOURSELF 1 


The populations of several U.S. cities are shown in the table. Which data are 
qualitative data and which are quantitative data? Explain your reasoning. 


(Source: U.S. Census Bureau) 
Answer: Page A31 


32 CHAPTER 1 Introduction to Statistics 


Levels of Measurement 


Another characteristic of data is its level of measurement. The level of 
measurement determines which statistical calculations are meaningful. The four 
levels of measurement, in order from lowest to highest, are nominal, ordinal, 
interval, and ratio. 


DEFINITION 


Data at the nominal level of measurement are qualitative only. Data at this 
level are categorized using names, labels, or qualities. No mathematical 


computations can be made at this level. 


Data at the ordinal level of measurement are qualitative or quantitative. Data 
at this level can be arranged in order, or ranked, but differences between data 
entries are not meaningful. 


When numbers are at the nominal level of measurement, they simply 
represent a label. Examples of numbers used as labels include Social Security 
numbers and numbers on sports jerseys. For instance, it would not make sense 
to add the numbers on the players’ jerseys for the Chicago Bears. 


BR 
a) Picturing 
the World 


For more than 25 years, the Harris 


Classifying Data by Level 


For each data set, determine whether the data are at the nominal level or at 
the ordinal level. Explain your reasoning. (Source: U.S. Bureau of Labor Statistics) 


Poll has conducted an annual study 1. F : 2: ; 
to determine the strongest brands, Top ae en Movie genres 
based on consumer response, in shame PES Action 
several industries. A recent study 1. Personal care aides PO 
determined the top five health 2. Registered nurses 
nonprofit brands, as shown in the % Home thealch aides Comedy 
table. (Source: Harris Poll) : 
4. Combined food preparation and rabies 
: : ‘ orror 
mop ave healinmenecontmend: ; aig sees including fast food 
. Retail salespersons 
1. St Jude Children’s Research P 
Hospi SOLUTION 


2. Shriners Hospital for Children 
3. Make-A-Wish 
4. The Jimmy Fund 


5. American Cancer Society 


1. This data set lists the ranks of the five fastest-growing occupations in the 
U.S. over the next few years. The data set consists of the ranks 1, 2, 3, 
4, and 5. Because the ranks can be listed in order, these data are at the 
ordinal level. Note that the difference between a rank of 1 and 5 has no 
mathematical meaning. 

In this list, what is the level of 


niedsuramant? 2. This data set consists of the names of movie genres. No mathematical 


computations can be made with the names, and the names cannot be 
ranked, so these data are at the nominal level. 


TRY IT YOURSELF 2 


For each data set, determine whether the data are at the nominal level or at the 
ordinal level. Explain your reasoning. 


1. The final standings for the Pacific Division of the National Basketball 
Association 
2. A collection of phone numbers Answer: Page A31 


SECTION 1.2 Data Classification 33 


The two highest levels of measurement consist of quantitative data only. 


DEFINITION 


Data at the interval level of measurement can be ordered, and meaningful 
differences between data entries can be calculated. At the interval level, a zero 
entry simply represents a position on a scale; the entry is not an inherent zero. 


Data at the ratio level of measurement are similar to data at the interval 
level, with the added property that a zero entry is an inherent zero. A ratio 
of two data entries can be formed so that one data entry can be meaningfully 
expressed as a multiple of another. 


An inherent zero is a zero that implies “none.” For instance, the amount 
of money you have in a savings account could be zero dollars. In this case, the 
zero represents no money; it is an inherent zero. On the other hand, a temperature 
of 0°C does not represent a condition in which no heat is present. The 
0°C temperature is simply a position on the Celsius scale; it is not an inherent zero. 

To distinguish between data at the interval level and at the ratio level, 
determine whether the expression “twice as much” has any meaning in the 
context of the data. For instance, $2 is twice as much as $1, so these data are at 
the ratio level. On the other hand, 2°C is not twice as warm as 1°C, so these data 
are at the interval level. 


INGHAY OT LoYonboon Classifying Data by Level 
World Series victories (years) Two data sets are shown at the left. Which data set consists of data at the 
1923, 1927, 1928, 1932, 1936, interval level? Which data set consists of data at the ratio level? Explain your 
1937, 1938, 1939, 1941, 1943, reasoning. (Source: Major League Baseball) 
1947, 1949, 1950, 1951, 1952, 
1953, 1956, 1958, 1961, 1962, SOLUTION 
1977, 1978, 1996, 1998, 1999, Both of these data sets contain quantitative data. Consider the dates of the 
2000, 2009 Yankees’ World Series victories. It makes sense to find differences between 


specific dates. For instance, the time between the Yankees’ first and last World 
Series victories is 


2016 American League 
home run totals (by team) 2009 — 1923 = 86 years. 

Balwmony ate But it does not make sense to say that one year is a multiple of another. So, 
Boston 208 these data are at the interval level. However, using the home run totals, you 
Chicago 168 can find differences and write ratios. For instance, Boston hit 23 more home 
Cleveland 185 runs than Cleveland hit because 208 — 185 = 23 home runs. Also, Baltimore 
Detroit 1 hit about 1.5 times as many home runs as Chicago hit because 
Houston 198 253 _ 15 
Kansas City 147 168 ae 
Los Angeles 156 So, these data are at the ratio level. 
eae TRY IT YOURSELF 3 
New York 183 ; ; 

For each data set, determine whether the data are at the interval level or at the 
Oakland 169 : ‘ : 

ratio level. Explain your reasoning. 
Seattle 223 
Tampa Bay 216 1. The body temperatures (in degrees Fahrenheit) of an athlete during an 
Texas 715 exercise session 

2. The heart rates (in beats per minute) of an athlete during an exercise session 
Toronto 221 


Answer: Page A31 


34 


CHAPTER 1 


Introduction to Statistics 


The tables below summarize which operations are meaningful at each of the 
four levels of measurement. When identifying a data set’s level of measurement, 


use the highest level that applies. 


Level of Put data in 
measurement categories 
Nominal Yes 
Ordinal Yes 
Interval Yes 
Ratio ¥es 


Summary of Four Levels of Measurement 


Nominal level 
(Qualitative data) 


Ordinal level 
(Qualitative or 
quantitative data) 


Interval level 
(Quantitative data) 


Ratio level 
(Quantitative data) 


Example of a data set 


Types of Shows Televised by a Network 


Comedy Documentaries 
Drama Cooking 
Reality Shows = Soap Operas 
Sports Talk Shows 
Motion Picture Association of America Ratings 
Description 
G General Audiences 


PG Parental Guidance Suggested 


PG-13 Parents Strongly Cautioned 
R Restricted 
NC-17 No One 17 and Under Admitted 


Average Monthly Temperatures (in degrees 
Fahrenheit) for Denver, CO 


Jan 30.7 Jul 74.2 
Feb 32.5 Aug 72.5 
Mar 40.4 Sep 63.4 
Apr 47.4 Oct 50.9 
May 57.1 Nov 38.3 

Jun 67.4 Dec 30.0 


(Source: National Climatic Data Center) 


Average Monthly Precipitation (in inches) 
for Orlando, FL 


Jan 2.35 Jul 7.27 
Feb 2.38 Aug 7.13 
Mar 3.77 Sep 6.06 
Apr 2.68 Oct 3.31 
May 3.45 Nov 2.17 

Jun = 7.58 Dec 2.58 


(Source: National Climatic Data Center) 


Arrange Subtract Determine whether 

data in data one data entry is a 

order entries multiple of another 
No No No 
Yess No No 
Yes Yes No 
Yes Yes Yes 


Meaningful calculations 


Put in a category. 


For instance, a show televised by 
the network could be put into 
one of the eight categories shown. 


Put in a category and put in order. 


For instance, a PG rating has 
a stronger restriction than a 
G rating. 


Put in a category, put in order, and 
find differences between data entries. 


For instance, 72.5 — 63.4 = 9.1°F. 
So, August is 9.1°F warmer than 
September. 


Put in a category, put in order, find 
differences between data entries, and 
find ratios of data entries. 


For instance, 
7.58 
ee 9 
3.77 


So, there is about twice as much 
precipitation in June as in March. 


1.2 EXERCISES 


SECTION 1.2 Data Classification 35 


For Extra Help: MyLab Statistics 


Building Basic Skills and Vocabulary 


1. 
2. 


Name each level of measurement for which data can be qualitative. 


Name each level of measurement for which data can be quantitative. 


True or False? Jn Exercises 3-6, determine whether the statement is true or 
false. If it is false, rewrite it as a true statement. 


3. 
4. 


Data at the ordinal level are quantitative only. 


For data at the interval level, you cannot calculate meaningful differences 
between data entries. 


. More types of calculations can be performed with data at the nominal level 


than with data at the interval level. 


. Data at the ratio level cannot be put in order. 


Using and Interpreting Concepts 


Classifying Data by Type 9 /n Exercises 7-14, determine whether the data are 
qualitative or quantitative. Explain your reasoning. 


7. 


Breeds of horses participating in a horse race 


. American Standard Code for Information Interchange (ASCII) codes 
. Blood pressure levels of athletes participating in a race 

. Speeds of bullet trains 

. Colors of fabrics at a clothing store 

. Widths of veins in different species of leaves 

. Weights of bricks at a construction site 


. Marital statuses mentioned in an employment form 


Classifying Data By Level Jn Exercises 15-20, determine the level of 
measurement of the data set. Explain your reasoning. 


15. 


16. 


China at Olympics The ranks that China secured at the Summer Olympics 
in different years are listed. (Source: International Olympic Committee) 


4 1 4 4 3 
2. + B@ @ 


Business Schools The top ten business schools in the United States for a 
recent year according to Forbes are listed. (Source: Forbes Media LLC) 


1. Stanford 6. Chicago (Booth) 

2. Harvard 7. Pennsylvania (Wharton) 
3. Northwestern (Kellogg) 8. UC Berkeley (Haas) 

4. Columbia 9. MIT (Sloan) 

5. Dartmouth (Tuck) 10. Cornell (Johnson) 


36 


CHAPTER 1 


Introduction to Statistics 


17. Flight Departures The flight numbers of 21 departing flights from Chicago 
O’Hare International Airport on an afternoon in October of 2016 are 
listed. (Source: Chicago O’Hare International Airport) 


1785 5159 4509 1575 6827 3486 7676 
1989 522 6868 1893 3133 3337 3266 
3458 334 6320 8385 3112 2110 7664 


18. Movie Times The times of the day when a multiplex shows a popular movie 


are listed: 
9:00 A.M. 9:10 A.M. 9:25 A.M. 9:40 A.M. 
10:35 A.M. 11:25 A.M. 2:30 P.M. 3:45 P.M. 
4:45 P.M. 5:30 P.M. 6:00 P.M. 8:00 P.M. 
9:30 P.M. 10:00 p.m. 10:20 P.M. 


19. Best Sellers List The top ten fiction books on The New York Times Best 
Sellers List on October 9, 2016, are listed. (Source: The New York Times) 


1. The Girl on the Train 6. The Light Between Oceans 

2. Home 7. Immortal Nights 

3. The Kept Woman 8. A Man Called Ove 

4. Magic Binds 9. Thrice the Brinded Cat Hath Mew’d 
5. Commonwealth 10. The Woman in Cabin 10 


20. Chapters The number of pages per chapter of a novel are listed. 
45 50 52 61 39 41 52 
55 43 28 36 44 48 39 


Graphical Analysis Jn Exercises 21-24, determine the level of measurement 
of the data listed on the horizontal and vertical axes in the figure. 


21. What is the Format of 22. How Many Vacations 
the Books You Read? Are You Planning to 
n , Take This Summer? 
40+ 


Percent 
FPrPeNN Ww 
ADunoned 
i i i ; i i 
T | | T 
ren aul sere 
Y 
Percent 


Be £2 3% 
BS Em £3 
oa) 
ees mak 
as 2 0 12 3-4. 5 or more 
Response Number of vacations 
(Source: Pew Research Center) (Source: The Harris Poll) 
23. Gender Profile of the 24. Motor Vehicle Fatalities 
114th Congress by Year 
A — 4 
+ BR 36+ 
500 z 
400 + g 357- 
3 Zu 
E3007 - 
=] Ne 
Z, 200+- 5 337 
32 
100+ 5 
> a = 
Women Meni 2011 2012 2013 2014 2015 
Gender Year 
(Source: Congressional Research Service) (Source: National Highway Traffic 


Safety Administration) 


SECTION 1.2 Data Classification 37 


25. The items below appear on a school’s admission form. Determine the level 
of measurement of the data for each category. 


(a) Gender (b) Previous grade level completed 
(c) Religion (d) Month of birth 

26. The items below appear on a movie rating website. Determine the level of 
measurement of the data for each category. 
(a) Year of release (b) Number of awards won 
(c) Genre (d) Star rating 


Classifying Data by Type and Level Jn Exercises 27-32, determine 
whether the data are qualitative or quantitative, and determine the level of 
measurement of the data set. 


27. Production The number of washing machines produced at six different 
manufacturing plants of a multinational company are listed. 


10,000 25,000 
14,000 19,000 
21,000 12,000 


28. Olympics The number of gold, silver, and bronze medals awarded at the 
2008 Olympic Games are listed. 


301 302 349 


29. Chess The list of the top 10 chess players in the world released in March 
2018 is given. (Source: World Chess Federation) 


1. Carlsen 6. Vachier-Lagrave 
2. Mamedyarov 7. Nakamura 

3. Kramnik 8. Caruana 

4. So 9, Giri 

5. Aronion 10. Anand 


30. Diving The scores for the gold medal winning diver in the men’s 10-meter 
platform event from the 2016 Summer Olympics are listed. (Source: 
International Olympic Committee) 


91.80 91.00 88.20 
97.20 99.90 91.80 


31. Concert Tours The top ten highest grossing worldwide concert tours for 
2016 are listed. (Source: Pollstar) 


1. Bruce Springsteen & the E Street Band 6. Justin Bieber 

2. Beyoncé 7. Paul McCartney 

3. Coldplay 8. Garth Brooks 

4. Guns N’ Roses 9. The Rolling Stones 
5. Adele 10. Celine Dion 


32. Numbers of Performances The numbers of performances for the 10 
longest-running Broadway shows at the end of the 2016 season are listed. 
(Source: The Broadway League) 


11,782 8107 7705 7485 6680 
6137 5959 5758 5461 5238 
Extending Concepts 


33. Writing What is an inherent zero? Describe three examples of data sets 
that have inherent zeros and three that do not. 


34. Describe two examples of data sets for each of the four levels of measurement. 
Justify your answer. 


Reputations of Companies in the U.S. 


For more than 50 years, The Harris Poll has conducted surveys using a 
representative sample of people in the United States. The surveys have 
been used to represent the opinions of people in the United States on 
many subjects, such as health, politics, the U.S. economy, and sports. 

Since 1999, The Harris Poll has conducted an annual survey to 
measure the reputations of the most visible companies in the United 
States, as perceived by U.S. adults. The Harris Poll uses a sample of about 
23,000 U.S. adults for the survey. The survey respondents rate companies 
according to 20 attributes that are classified into six categories: (1) social 
responsibility, (2) vision and leadership, (3) financial performance, 
(4) products and services, (5) emotional appeal, and (6) workplace 
environment. This information is used to determine the reputation of 
a company as Excellent, Very Good, Good, Fair, Poor, Very Poor, or 
Critical. The reputations (along with some additional information) of 
10 companies are shown in the table. 


Reputations of 10 Companies in the U.S. 


All U.S. Adults 


US. adults in The 
Harris Poll sample 
(about 23,000 US. 
adults) 


US. adults not in The Harris Poll sample 
(about 242.8 million U.S. adults) 


Exxon Mobil Corp.; Wells Fargo & Co.) 


EXERCISES 


1. Sampling Percent What percentage of the total 
number of U.S. adults did The Harris Poll sample 
for its survey? (Assume the total number of U.S. 
adults is 242.8 million.) 


2. Nominal Level of Measurement Identify any 
column in the table with data at the nominal level. 


3. Ordinal Level of Measurement Identify any 
column in the table with data at the ordinal level. 
Describe two ways that the data can be ordered. 


38 CHAPTER 1 _ Introduction to Statistics 


6. Inferences 


Year Company Number of 

Company Name Formed Reputation Industry Employees 
Amazon.com 1994 Excellent Retail 230,800 
Apple, Inc. 1977 Excellent | Computers and peripherals 116,000 
Netflix, Inc. 1999 Very Good Internet television ai 4.700 
The Kraft Heinz Co. Very Good Food products 41,000 
Facebook, Inc. Good Internet 17,048 
Ford Motor Co. Good Automotive 201,000 
Chipotle Mexican Grill, Inc. Fair Restaurant 64,570 
Comcast Corp. Poor Cable television 136,000 
Exxon Mobil Corp. Poor Petroleum (integrated) 71,100 
Wells Fargo & Co. 1998 Critical Banking 265,000 

Source: The Harris Poll; Amazon.com; Apple, Inc.; Netflix, Inc.; The Kraft Heinz Co.; Facebook, Inc.; Ford Motor Co.; Chipotle Mexican Grill, Inc.; Comcast Corp.; 


4. Interval Level of Measurement Identify any 


column in the table with data at the interval level. 
How can these data be ordered? 


5. Ratio Level of Measurement Identify any 


column in the table with data at the ratio level. 


What decisions can be made on the 
basis of The Harris Poll survey that measures the 
reputations of the most visible companies in the 
United States? 


SECTION 1.3 Data Collection and Experimental Design 39 


13 Data Collection and Experimental Design 


What You Should Learn Design of a Statistical Study m= Data Collection = Experimental Design 
m= Sampling Techniques 


» How to design a statistical 


study and how to distinguish ; aa 
between an observational study Design of a Statistical Study 


and eines bennett The goal of every statistical study is to collect data and then use the data to make 


a decision. Any decision you make using the results of a statistical study is only 
as good as the process used to obtain the data. When the process is flawed, the 


» How to collect data by using a 
survey or a simulation 


~ How to design an experiment resulting decision is questionable. 

~ How to create a sample using Although you may never have to develop a statistical study, it is likely that 
random sampling, simple you will have to interpret the results of one. Before interpreting the results of a 
random sampling, stratified study, however, you should determine whether the results are reliable. In other 


sampling, cluster sampling, and 
systematic sampling and how 
to identify a biased sample 

i Q GUIDELINES 


words, you should be familiar with how to design a statistical study. 


Designing a Statistical Study 


1. Identify the variable(s) of interest (the focus) and the population of the study. 


Develop a detailed plan for collecting data. If you use a sample, make 
sure the sample is representative of the population. 


Collect the data. 
Describe the data, using descriptive statistics techniques. 


Interpret the data and make decisions about the population using 
inferential statistics. 


Identify any possible errors. 


A statistical study can usually be categorized as an observational study or 
an experiment. In an observational study, a researcher does not influence the 
responses. In an experiment, a researcher deliberately applies a treatment before 
observing the responses. Here is a brief summary of these types of studies. 


e In an observational study, a researcher observes and measures characteristics 
of interest of part of a population but does not change existing conditions. For 
instance, an observational study was conducted in which researchers measured 
the amount of time people spent doing various activities, such as paid work, 
childcare, and socializing. (Source: U.S. Bureau of Labor Statistics) 


e In performing an experiment, a treatment is applied to part of a population, 
called a treatment group, and responses are observed. Another part of the 
population may be used as a control group, in which no treatment is applied. 
(The subjects in both groups are called experimental units.) In many cases, 
subjects in the control group are given a placebo, which is a harmless, fake 
treatment that is made to look like the real treatment. The responses of both 
groups can then be compared and studied. In most cases, it is a good idea to 
use the same number of subjects for each group. For instance, an experiment 
was performed in which overweight subjects in a treatment group were given 
the artificial sweetener sucralose to drink while a control group drank water. 
After performing a glucose test, researchers concluded that “sucralose affects 
the glycemic and insulin responses” in overweight people who do not normally 
consume artificial sweeteners. (Source: Diabetes Care) 


40 


CHAPTER 1 


Introduction to Statistics 


Distinguishing Between an Observational Study 
and an Experiment 
Determine whether each study is an observational study or an experiment. 


1. Researchers study the effect of vitamin D3; supplementation among 
patients with antibody deficiency or frequent respiratory tract infections. 
To perform the study, 70 patients receive 4000 IU of vitamin D3 daily for 
a year. Another group of 70 patients receive a placebo daily for one year. 
(Source: British Medical Journal) 


2. Researchers conduct a study to determine how confident Americans are in 
the U.S. economy. To perform the study, researchers call 3040 U.S. adults 
and ask them to rate current U.S. economic conditions and whether the 
USS. economy is getting better or worse. (Source: Gallup) 


SOLUTION 


1. Because the study applies a treatment (vitamin D3) to the subjects, the 
study is an experiment. 


2. Because the study does not attempt to influence the responses of the 
subjects (there is no treatment), the study is an observational study. 


TRY IT YOURSELF 1 


The Pennsylvania Game Commission conducted a study to count the number 
of elk in Pennsylvania. The commission captured and released 636 elk, which 
included 350 adult cows, 125 calves, 110 branched bulls, and 51 spikes. Is this 
study an observational study or an experiment? (Source: Pennsylvania Game 
Commission) 

Answer: Page A31 


Data Collection 


There are several ways to collect data. Often, the focus of the study dictates 
the best way to collect data. Here is a brief summary of two methods of data 
collection. 


e A simulation is the use of a mathematical or physical model to reproduce 
the conditions of a situation or process. Collecting data often involves the 
use of computers. Simulations allow you to study situations that are impractical 
or even dangerous to create in real life, and often they save time and money. 
For instance, automobile manufacturers use simulations with dummies to 
study the effects of crashes on humans. Throughout this course, you will 
have the opportunity to use applets that simulate statistical processes on 
a computer. 


e A survey is an investigation of one or more characteristics of a population. 
Most often, surveys are carried out on people by asking them questions. The 
most common types of surveys are done by interview, Internet, phone, or mail. 
In designing a survey, it is important to word the questions so that they do 
not lead to biased results, which are not representative of a population. For 
instance, a survey is conducted on a sample of female physicians to determine 
whether the primary reason for their career choice is financial stability. In 
designing the survey, it would be acceptable to make a list of reasons and ask 
each individual in the sample to select her first choice. 


Study Tip 


The Hawthorne effect 
occurs in an experiment 


when subjects change 
their behavior simply 
because they know 
they are participating 
in an experiment. 


30-39 
years old 
40-49 
years old 
Over 50 
years old 


Randomized Block Design 


Treatment 


Treatment 


Treatment 


SECTION 1.3 Data Collection and Experimental Design 41 


Experimental Design 


To produce meaningful unbiased results, experiments should be carefully 
designed and executed. It is important to know what steps should be taken to 
make the results of an experiment valid. Three key elements of a well-designed 
experiment are control, randomization, and replication. 

Because experimental results can be ruined by a variety of factors, being able 
to control these influential factors is important. One such factor is a confounding 
variable. 


DEFINITION 


A confounding variable occurs when an experimenter cannot tell the 


difference between the effects of different factors on the variable. 


For instance, to attract more customers, a coffee shop owner experiments 
by remodeling the shop using bright colors. At the same time, a shopping mall 
nearby has its grand opening. If business at the coffee shop increases, it cannot 
be determined whether it is because of the new colors or the new shopping mall. 
The effects of the colors and the shopping mall have been confounded. 

Another factor that can affect experimental results is the placebo effect. The 
placebo effect occurs when a subject reacts favorably to a placebo when in fact 
the subject has been given a fake treatment. To help control or minimize the 
placebo effect, a technique called blinding can be used. 


DEFINITION 


Blinding is a technique where the subjects do not know whether they are 
receiving a treatment or a placebo. In a double-blind experiment, neither 


the experimenter nor the subjects know whether the subjects are receiving a 
treatment or a placebo. The experimenter is informed after all the data have 
been collected. This type of experimental design is preferred by researchers. 


One challenge for experimenters is assigning subjects to groups so the 
groups have similar characteristics (such as age, height, weight, and so on). When 
treatment and control groups are similar, experimenters can conclude that any 
differences between groups is due to the treatment. To form groups with similar 
characteristics, experimenters use randomization. 


DEFINITION 


Randomization is a process of randomly assigning subjects to different 


treatment groups. 


In a completely randomized design, subjects are assigned to different 
treatment groups through random selection. In some experiments, it may be 
necessary for the experimenter to use blocks, which are groups of subjects with 
similar characteristics. A commonly used experimental design is a randomized 
block design. To use a randomized block design, the experimenter divides the 
subjects with similar characteristics into blocks, and then, within each block, 
randomly assign subjects to treatment groups. For instance, an experimenter who 
is testing the effects of a new weight loss drink may first divide the subjects into 
age categories such as 30-39 years old, 40-49 years old, and over 50 years old, 
and then, within each age group, randomly assign subjects to either the treatment 
group or the control group (see figure at the left). 


A2 


CHAPTER 1 Introduction to Statistics 


Study Tip 


The validity of an 
experiment refers to the 
accuracy and reliability of 
the experimental results. 
The results of a valid 
experiment are more 
likely to be accepted in 
the scientific community. 


Another type of experimental design is a matched-pairs design, where 
subjects are paired up according to a similarity. One subject in each pair is 
randomly selected to receive one treatment while the other subject receives a 
different treatment. For instance, two subjects may be paired up because of their 
age, geographical location, or a particular physical characteristic. 

Sample size, which is the number of subjects in a study, is another important 
part of experimental design. To improve the validity of experimental results, 
replication is required. 


DEFINITION 


Replication is the repetition of an experiment under the same or similar 


conditions. 


For instance, suppose an experiment is designed to test a vaccine against 
a strain of influenza. In the experiment, 10,000 people are given the vaccine 
and another 10,000 people are given a placebo. Because of the sample size, the 
effectiveness of the vaccine would most likely be observed. But, if the subjects in 
the experiment are not selected so that the two groups are similar (according to 
age and gender), the results are of less value. 


Analyzing an Experimental Design 


A company wants to test the effectiveness of a new gum developed to help 
people quit smoking. Identify a potential problem with each experimental 
design and suggest a way to improve it. 


1. The company identifies ten adults who are heavy smokers. Five of the 
subjects are given the new gum and the other five subjects are given a 
placebo. After two months, the subjects are evaluated and it is found that 
the five subjects using the new gum have quit smoking. 


2. The company identifies one thousand adults who are heavy smokers. The 
subjects are divided into blocks according to gender. Females are given the 
new gum and males are given the placebo. After two months, a significant 
number of the female subjects have quit smoking. 


SOLUTION 


1. The sample size being used is not large enough to validate the results of 
the experiment. The experiment must be replicated to improve the validity. 


2. The groups are not similar. The new gum may have a greater effect on 
women than on men, or vice versa. The subjects can be divided into blocks 
according to gender, but then, within each block, they should be randomly 
assigned to be in the treatment group or in the control group. 


TRY IT YOURSELF 2 


The company in Example 2 identifies 240 adults who are heavy smokers. 
The subjects are randomly assigned to be in a gum treatment group or in a 
control group. Each subject is also given a DVD featuring the dangers of 
smoking. After four months, most of the subjects in the treatment group have 
quit smoking. Identify a potential problem with the experimental design and 
suggest a way to improve it. 

Answer: Page A31 


‘| Study Tip 


A biased sample is one 
that is not representative 
of the population from 
which it is drawn. For 
instance, a sample 
consisting of only 18- to 
22-yearold U.S. college 
students would not be representative 
of the entire 18- to 22-yearold 
population in the United States. 


To explore this topic further, 
see Activity 1.3 on page 49. 


Tech Tip 


You can use technology 
such as Minitab, Excel, 
StatCrunch, or the 
TI-84 Plus to generate 
random numbers. 
(Detailed instructions 
for using Minitab, Excel, and 

the TI-84 Plus are shown in the 
technology manuals that accompany 
this text.) For instance, here are 
instructions for using the random 
integer generator on a TI-84 Plus for 
Example 3. 


-, “‘~— 


MATH 


Choose the PRB menu. 
5: randint( 


MLZ) GI LIT81D) 


ENTER 


randInteis 31,83 
C357 35 249 PSE... 


Continuing to press |ENTER] will 


generate more random samples of 
8 integers. 


SECTION 1.3 Data Collection and Experimental Design 43 


Sampling Techniques 


A census is a count or measure of an entire population. Taking a census provides 
complete information, but it is often costly and difficult to perform. A sampling is a 
count or measure of part of a population and is more commonly used in statistical 
studies. To collect unbiased data, a researcher must ensure that the sample is 
representative of the population. Appropriate sampling techniques must be used to 
ensure that inferences about the population are valid. Remember that when a study 
is done with faulty data, the results are questionable. Even with the best methods of 
sampling, a sampling error may occur. A sampling error is the difference between 
the results of a sample and those of the population. When you learn about inferential 
statistics, you will learn techniques of controlling sampling errors. 

A random sample is one in which every member of the population has an 
equal chance of being selected. A simple random sample is a sample in which 
every possible sample of the same size has the same chance of being selected. 
One way to collect a simple random sample is to assign a different number to 
each member of the population and then use a random number table like Table 1 
in Appendix B. Responses, counts, or measures for members of the population 
whose numbers correspond to those generated using the table would be in the 
sample. Calculators and computer software programs are also used to generate 
random numbers (see page 58). 


Table 1—Random Numbers 


92630 78240 19267 95457 53497 23894 37708 79862 
79445 78735 71549 44843 26104 67318 00701 34986 
59654 71966 27386 50004 05358 94031 29281 18544 
31524 49587 76612 39789 13537 48086 59483 60680 
06348 76938 90379 51392 55887 71015 09209 79157 


Portion of Table 1 found in Appendix B 


Consider a study of the number of people who live in West Ridge County. To 
use a simple random sample to count the number of people who live in West Ridge 
County households, you could assign a different number to each household, use 
a technology tool or table of random numbers to generate a sample of numbers, 
and then count the number of people living in each selected household. 


Using a Simple Random Sample 


There are 731 students currently enrolled in a statistics course at your school. 
You wish to form a sample of eight students to answer some survey questions. 
Select the students who will belong to the simple random sample. 


SOLUTION 


Assign numbers 1 to 731 to the students in the course. In the table of random 
numbers, choose a starting place at random and read the digits in groups of 
three (because 731 is a three-digit number). For instance, if you started in the 
third row of the table at the beginning of the second column, you would group 
the numbers as follows: 


719|66 2|738|6 50|004| 053|58 9|403|1 29|281| 185|44 


Ignoring numbers greater than 731, the first eight numbers are 719, 662, 650, 4, 53, 
589, 403, and 129. The students assigned these numbers will make up the sample. 
To find the sample using a TI-84 Plus, follow the instructions shown at the left. 


44 CHAPTER 1 


| Study Tip 

Be sure you understand 
that stratified sampling 
randomly selects a 
sample of members from 
all strata. Cluster sampling 
» uses ali members from 

a randomly selected sample of 
clusters (but not all, so some clusters 
will not be part of the sample). For 
instance, in the figure for “Stratified 
Sampling” at the right, a sample of 
households in West Ridge County 

is randomly selected from al/ three 
income groups. In the figure for 
“Cluster Sampling,” al! households in 
a randomly selected cluster (Zone 1) 
are used. (Notice that the other 
zones are not part of the sample.) 


Introduction to Statistics 


TRY IT YOURSELF 3 


A company employs 79 people. Choose a simple random sample of five to 
survey. 
Answer: Page A31 


When you choose members of a sample, you should decide whether it is 
acceptable to have the same population member selected more than once. If it 
is acceptable, then the sampling process is said to be with replacement. If it is not 
acceptable, then the sampling process is said to be without replacement. 

There are several other commonly used sampling techniques. Each has 
advantages and disadvantages. 


¢ Stratified Sample When it is important for the sample to have members 
from each segment of the population, you should use a stratified sample. 
Depending on the focus of the study, members of the population are divided 
into two or more subsets, called strata, that share a similar characteristic 
such as age, gender, ethnicity, or even political preference. A sample is then 
randomly selected from each of the strata. Using a stratified sample ensures 
that each segment of the population is represented. For instance, to collect 
a stratified sample of the number of people who live in West Ridge County 
households, you could divide the households into socioeconomic levels and 
then randomly select households from each level. In using a stratified sample, 
care must be taken to ensure that all strata are sampled in proportion to their 
actual percentages of occurrence in the population. For instance, if 40% of 
the people in West Ridge County belong to the low-income group, then the 
proportion of the sample should have 40% from this group. 


@0%@a F@@O adq 


Group 1: Group 2: Group 3: 
Low income Middle income High income 


Stratified Sampling 


¢ ClusterSample When the population falls into naturally occurring subgroups, 
each having similar characteristics, a cluster sample may be the most 
appropriate. To select a cluster sample, divide the population into groups, 
called clusters, and select all of the members in one or more (but not all) of the 
clusters. Examples of clusters could be different sections of the same course 
or different branches of a bank. For instance, to collect a cluster sample of 
the number of people who live in West Ridge County households, divide the 
households into groups according to zip codes, then select all the households 
in one or more, but not all, zip codes and count the number of people living in 
each household. In using a cluster sample, care must be taken to ensure that 
all clusters have similar characteristics. For instance, if one of the zip code 
clusters has a greater proportion of high-income people, the data might not be 
representative of the population. 


Zip Code Zones in West Ridge County 


Cluster Sampling 


< 
2% 


ee yn: : 
eee) Picturing 
the World 


The research firm Gallup conducts 
many polls (or surveys) regarding 
the president, Congress, and 
political and nonpolitical issues. 
A commonly cited Gallup poll 

is the public approval rating of 
the president. For instance, the 
approval ratings for President 
Barack Obama for selected 
months in 2016 are shown in the 
figure. (Each rating is from the 
poll conducted at the end of the 
indicated month.) 


President’s Approval 
Ratings, 2016 


60 
50 
40 
30 
20 
10 


52 54 53 


49 


Percent approving 


> 
Jan Apr Jul Oct 


Month 


Discuss some ways that Gallup 
could select a biased sample to 
conduct a poll. How could Gallup 
select a sample that is unbiased? 


SECTION 1.3 Data Collection and Experimental Design 45 


e Systematic Sample A systematic sample is a sample in which each member 


of the population is assigned a number. The members of the population are 
ordered in some way, a starting number is randomly selected, and then sample 
members are selected at regular intervals from the starting number. (For 
instance, every 3rd, Sth, or 100th member is selected.) For instance, to collect 
a systematic sample of the number of people who live in West Ridge County 
households, you could assign a different number to each household, randomly 
choose a starting number, select every 100th household, and count the number 
of people living in each. An advantage of systematic sampling is that it is easy 
to use. In the case of any regularly occurring pattern in the data, however, this 
type of sampling should be avoided. 


Randomly choose 


a starting number Select every 100th household 
— or > 
86 + 100 186 + 100 286 + 100 386 + 100 


i Or ia sai a a.” 


Systematic Sampling 
A type of sample that often leads to biased studies (so it is not recommended) 


is a convenience sample. A convenience sample consists only of members of the 
population that are easy to get. 


Identifying Sampling Techniques 
You are doing a study to determine the opinions of students at your school 


regarding stem cell research. Identify the sampling technique you are using 
when you select the samples listed. Discuss potential sources of bias (if any). 


1. You divide the student population with respect to majors and randomly 
select and question some students in each major. 


2. You assign each student a number and generate random numbers. You then 
question each student whose number is randomly selected. 


3. You select students who are in your biology class. 


SOLUTION 


1. Because students are divided into strata (majors) and a sample is selected 
from each major, this is a stratified sample. 


2. Each sample of the same size has an equal chance of being selected and each 
student has an equal chance of being selected, so this is a simple random sample. 


3. Because the sample is taken from students that are readily available, this is 
a convenience sample. The sample may be biased because biology students 
may be more familiar with stem cell research than other students and may 
have stronger opinions. 


TRY IT YOURSELF 4 


You want to determine the opinions of students regarding stem cell research. 
Identify the sampling technique you are using when you select these samples. 


1. You select a class at random and question each student in the class. 


2. You assign each student a number and, after choosing a starting number, 
question every 25th student. Answer: Page A31 


46 CHAPTER 1 Introduction to Statistics 


1.3 EXERCISES 


For Extra Help: MyLab Statistics 


Building Basic Skills and Vocabulary 


1. 
2. 
3. 
4. 


What is the difference between an observational study and an experiment? 
What is the difference between a census and a sampling? 
What is the difference between a random sample and a simple random sample? 


What is replication in an experiment? Why is replication important? 


True or False? Jn Exercises 5-10, determine whether the statement is true or 
false. If it is false, rewrite it as a true statement. 


5. 
6. 
7. 


10. 


A placebo is an actual treatment. 
A double-blind experiment is used to increase the placebo effect. 


Using a systematic sample guarantees that members of each group within a 
population will be sampled. 


. A convenience sample is always representative of a population. 


. The method for selecting a stratified sample is to order a population in some 


way and then select members of the population at regular intervals. 


To select a cluster sample, divide a population into groups and then select all 
of the members in at least one (but not all) of the groups. 


Distinguishing Between an Observational Study and an Experiment 
In Exercises 11-14, determine whether the study is an observational study or an 
experiment. Explain. 


11. 


12. 


13. 


14. 


15. 


16. 


In a survey of 1033 U.S. adults, 51% said U.S. presidents should release all 
medical information that might affect their ability to serve. (Source: Gallup) 


Researchers demonstrated that adults using an intensive program to lower 
systolic blood pressure to less than 120 millimeters of mercury reduce the 
risk of death from all causes by 27%. (Source: American Heart Association) 


To study the effects of social media on teenagers’ brains, researchers showed 
a few dozen teenagers photographs that had varying numbers of “likes” 
while scanning the reactions in their brains. (Source: NPR) 


In a study designed to research the effect of music on driving habits, 
1000 motorists ages 17-25 years old were asked whether the music they 
listened to influenced their driving. (Source: More Th>n) 


Random Number Table Use the sixth row of Table 1 in Appendix B to 
generate 12 random numbers between 1 and 99. 


Random Number Table Use the tenth row of Table 1 in Appendix B to 
generate 10 random numbers between 1 and 920. 


Random Numbers In Exercises 17 and 18, use technology to generate the 
random numbers. 


17. 
18. 


Fifteen numbers between 1 and 150 


Nineteen numbers between 1 and 1000 


SECTION 1.3 Data Collection and Experimental Design 47 


Using and Interpreting Concepts 


19. 


20. 


21. 


22. 


Allergy Drug A pharmaceutical company wants to test the effectiveness 
of a new drug used to treat migraine headaches. The company identifies 
500 females ages 25 to 45 years old who suffer from migraine headaches. The 
subjects are randomly assigned into two groups. One group is given the drug 
and the other is given a placebo that looks exactly like the drug. After three 
months, the subjects’ symptoms are studied and compared. 


(a) Identify the experimental units and treatments used in this experiment. 


(b) Identify a potential problem with the experimental design being used 
and suggest a way to improve it. 


(c) How could this experiment be designed to be double-blind? 


Dietary Supplement Researchers in Germany tested the effect of a dietary 
supplement designed to control metabolism in patients with type 2 diabetes. 
Thirty-one patients with type 2 diabetes completed the study. The patients 
were assigned at random either the supplement or a placebo for 12 weeks. 
After a subsequent “wash-out” period of 12 weeks, the patients were assigned 
the other product. At the conclusion of the study, the patients’ glycated 
hemoglobin, fasting blood glucose, and fructosamine levels were checked, as 
well as their lipid parameters. (Source: Food and Nutrition Research) 


(a) Identify the experimental units and treatments used in this experiment. 


(b) Identify a potential problem with the experimental design being used 
and suggest a way to improve it. 


(c) The experiment is described as a placebo-controlled, double-blind study. 
Explain what this means. 


(d) How could blocking be used in designing this experiment? 


Dieting A researcher wants to study the effects of dieting on obesity. 
Eighteen people volunteer for the experiment: Lewis, Alice, Raj, William, 
Edwin, Mercer, Edgar, Bill, Zoya, Kate, Lara, Bertha, Dennis, Jennifer, 
Ahmed, Ronald, Harry, and Arthur. Use a random number generator to 
choose nine subjects for the treatment group. The other nine subjects will go 
into the control group. List the subjects in each group. Tell which method 
you used to generate the random numbers. 


Using a Simple Random Sample Participants of an experiment are 
numbered from 1 to 80. The participants are to be randomly assigned to two 
different groups. Use a random number generator different from the one 
you used in Exercise 21 to choose 40 participants for the treatment group. 
The other 40 participants will go into the control group. List the participants, 
according to number, in each group. Tell which method you used to generate 
the random numbers. 


Identifying Sampling Techniques In Exercises 23-28, identify the 
sampling technique used, and discuss potential sources of bias (if any). Explain. 


23. 


24. 


25. 


After an election, a constituency is divided into 50 equal areas. Twelve of the 
areas are selected, and every occupied household in the area is interviewed 
to help focus political efforts on what residents require the most. 


Questioning university students as they leave a fraternity party, a researcher 
asks 463 students about their study habits. 


Selecting employees at random from an employee directory, researchers 
contact 300 people and ask what obstacles (such as computer problems) 
keep them from accomplishing tasks at work. 


48 


CHAPTER 1 


Introduction to Statistics 


26. A sample executive is chosen from each department of an organization for a 
survey. 


27. From visits made to randomly generated house numbers, 1638 residents are 
asked if they own a vehicle or not. 


28. Every sixth customer entering an ice cream parlor is asked to name his or her 
favorite flavor of ice cream. 


Choosing Between a Census and a Sampling Jn Exercises 29 and 30, 
determine whether you would take a census or use a sampling. If you would use a 
sampling, determine which sampling technique you would use. Explain. 


29. The most popular model of mobile phone among 4,00,000 mobile phone 
purchasers 


30. The average height of the 264 students of a high school. 


Recognizing a Biased Question Jn Exercises 31-34, determine whether the 
survey question is biased. If the question is biased, suggest a better wording. 


31. Why does eating whole-grain foods improve your health? 
32. How much water do you drink on an average day? 
33. Why does listening to music while studying increase the chances of retention? 


34. What do you think about the battery backup of the mobile phone? 


Extending Concepts 

35. Analyzing a Study Find an article or a news story that describes a statistical 
study. 
(a) Identify the population and the sample. 


(b) Classify the data as qualitative or quantitative. Determine the level of 
measurement. 


(c) Is the study an observational study or an experiment? If it is an 
experiment, identify the treatment. 


(d) Identify the sampling technique used to collect the data. 


36. Designing and Analyzing a Study Design a study for some subject that is of 
interest to you. Answer parts (a)—(d) of Exercise 35 for this study. 


37. Open and Closed Questions Two types of survey questions are open 
questions and closed questions. An open question allows for any kind 
of response; a closed question allows for only a fixed response. An open 
question and a closed question with its possible choices are given below. List 
an advantage and a disadvantage of each question. 


Open Question What can be done to get students to eat healthier foods? 
Closed Question How would you get students to eat healthier foods? 
1. Mandatory nutrition course 
2. Offer only healthy foods in the cafeteria and remove unhealthy foods 
3. Offer more healthy foods in the cafeteria and raise the prices on 
unhealthy foods 


38. Natural Experiments Observational studies are sometimes referred to as 
natural experiments. Explain, in your own words, what this means. 


ACTIVITY 


ey 
APPLET 
You can find the interactive 
applet for this activity 
within MyLab Statistics or at 
www.pearsonglobaleditions 
.com. 


eB 


APPLET 


Random Numbers 


The random numbers applet is designed to allow you to generate random 
numbers from a range of values. You can specify integer values for the 
minimum value, maximum value, and the number of samples in the 
appropriate fields. You should not use decimal points when filling in the 
fields. When SAMPLE is clicked, the applet generates random values, which 
are displayed as a list in the text field. 


Minimum value: 


Maximum value: 


Number of samples: 


Sample 


Step 1 Specify a minimum value. 

Step 2 Specify a maximum value. 

Step 3 Specify the number of samples. 

Step 4 Click SAMPLE to generate a list of random values. 


DRAW CONCLUSIONS 


1. Specify the minimum, maximum, and number of samples to be 1, 20, and 8, 
respectively, as shown. Run the applet. Continue generating lists until you 
obtain one that shows that the random sample is taken with replacement. 
Write down this list. How do you know that the list is a random sample 
taken with replacement? 


Minimum value: 1 


Maximum value: 20 


Number of samples: |8 


Sample 


2. Use the applet to repeat Example 3 on page 43. What values did you use 
for the minimum, maximum, and number of samples? Which method do 
you prefer? Explain. 


SECTION 1.3 Data Collection and Experimental Design 49 


AND | Statistics in the Real World 


Uses 


An experiment studied 321 women with advanced breast cancer. All of the 
women had been previously treated with other drugs, but the cancer had stopped 
responding to the medications. The women were then given the opportunity to 
take a new drug combined with a chemotherapy drug. 

The subjects were divided into two groups, one that took the new drug 
combined with a chemotherapy drug, and one that took only the chemotherapy 
drug. After three years, results showed that the new drug in combination with the 
chemotherapy drug delayed the progression of cancer in the subjects. The results 
were so significant that the study was stopped, and the new drug was offered to 
all women in the study. The Food and Drug Administration has since approved 
use of the new drug in conjunction with a chemotherapy drug. 


Abuses 


For four years, one hundred eighty thousand teenagers in Norway were used 
as subjects to test a new vaccine against the deadly bacteria meningococcus b. 
A brochure describing the possible effects of the vaccine stated, “it is unlikely 
to expect serious complications,” while information provided to the Norwegian 
Parliament stated, “serious side effects can not be excluded.” The vaccine trial 
had some disastrous results: More than 500 side effects were reported, with some 
considered serious, and several of the subjects developed serious neurological 
diseases. The results showed that the vaccine was providing immunity in only 
57% of the cases. This result was not sufficient for the vaccine to be added to 
Norway’s vaccination program. Compensations have since been paid to the 
vaccine victims. 


Ethics 


Experiments help us further understand the world that surrounds us. But, in 
some cases, they can do more harm than good. In the Norwegian experiments, 
several ethical questions arise. Was the Norwegian experiment unethical if the 
best interests of the subjects were neglected? When should the experiment have 
been stopped? Should it have been conducted at all? When serious side effects 
are not reported and are withheld from subjects, there is no ethical question 
here, it is just wrong. 

On the other hand, the breast cancer researchers would not want to deny 
the new drug to a group of patients with a life-threatening disease. But again, 
questions arise. How long must a researcher continue an experiment that shows 
better-than-expected results? How soon can a researcher conclude a drug is safe 
for the subjects involved? 


EXERCISES 


1. Find an example of a real-life experiment other than the one described above 
that may be considered an “abuse.” What could have been done to avoid the 
outcome of the experiment? 


2. Stopping an Experiment In your opinion, what are some problems that may 
arise when clinical trials of a new experimental drug or vaccine are stopped 
early and then the drug or vaccine is distributed to other subjects or patients? 


50 CHAPTER 1 Introduction to Statistics 


Chapter Summary 51 


—_ Chapter Summary 


Review 
What Did You Learn? Example(s) Exercises 
Section 1.1 
» How to distinguish between a population and a sample 1 1-4 
» How to distinguish between a parameter and a statistic 2 5-8 
» How to distinguish between descriptive statistics and inferential statistics 3 9, 10 
Section 1.2 
» How to distinguish between qualitative data and quantitative data 1 11-14 
» How to classify data with respect to the four levels of measurement: nominal, 2:3 15-18 
ordinal, interval, and ratio 
Arrange Subtract Determine whether 
Level of Put datain datain data one data entry is a 
measurement categories order entries ‘multiple of another 
Nominal Yes No No No 
Ordinal Yes Yes No No 
Interval Yes Yes Yes No 
Ratio Yes Yes Yes Yes 
Section 1.3 
» How to design a statistical study and how to distinguish between an 1 19, 20 
observational study and an experiment 
» How to design an experiment 2 21, 22 
» How to create a sample using random sampling, simple random sampling, 3,4 23-29 


stratified sampling, cluster sampling, and systematic sampling and how to 
identify a biased sample 


Sampling Techniques 


Random: A sample in which every member of a population has an equal chance 
of being selected. 


Simple random: A sample in which every possible sample of the same size has 
the same chance of being selected from a population. 


Stratified: Members of a population are divided into two or more subsets, called 
strata, that share a similar characteristic. A sample is then randomly selected 
from each of the strata. Using a stratified sample ensures that each segment of 
the population is represented. 


Cluster: The population is divided into groups (or clusters) and all of the 
members in one or more (but not all) of the clusters are selected. To avoid 
a biased sample, care must be taken to ensure that all clusters have similar 
characteristics. 


Systematic: Each member of a population is assigned a number. The members of 
the population are ordered in some way, a starting number is randomly selected, 
and then sample members are selected at regular intervals from the starting 
number. (For instance, every 3rd, 5th, or 100th member is selected.) 


52 


CHAPTER 1 


Introduction to Statistics 


1 Review Exercises 


Section 1.1 


In Exercises 1-4, identify the population and the sample. Describe the sample data set. 


1. A survey of 4787 U.S. adults found that 15% use ride-hailing applications. 
(Source: Pew Research Center) 


2. Forty-two professors in Pennsylvania were surveyed concerning their 
opinions of the current education policy of the state. 


3. A survey of 2223 U.S. adults found that 62% would encourage a child to 
pursue a career as a video game developer or designer. (Source: The Harris Poll) 


4. A survey of 1601 U.S. children and adults ages 16 years and older found 
that 48% have visited a public library or a bookmobile over a recent span of 
12 months. (Source: Pew Research Center) 


In Exercises 5—8, determine whether the number describes a population parameter 
or a sample statistic. Explain your reasoning. 


5. In 2016, the National Science Foundation announced $22.7 million in 
infrastructure-strengthening investments. (Source: National Science Foundation) 


6. In a survey of 1000 likely U.S. voters, 29% trust media fact-checking of 
candidates’ comments. (Source: Rasmussen Reports) 


7. Inarecent study of physics majors at a university, 12 students were minoring 
in math. 


8. Thirty percent of a sample of 521 U.S. workers say that they worry about 
having their benefits reduced. (Source: Gallup) 

9. Which part of the survey described in Exercise 3 represents the descriptive 
branch of statistics? Make an inference based on the results of the survey. 


10. Which part of the survey described in Exercise 4 represents the descriptive 
branch of statistics? Make an inference based on the results of the survey. 


Section 1.2 


In Exercises 11-14, determine whether the data are qualitative or quantitative. 
Explain your reasoning. 


11. The ages of a sample of 430 employees of a software company 
12. The IQ levels of the students of a secondary school 
13. The revenues of the companies on the Fortune 500 list 


14. The genders of a sample of 1,000 students of a university 


In Exercises 15-18, determine the level of measurement of the data set. Explain. 


15. The daily high temperatures (in degrees Fahrenheit) for Sacramento, California, 
for a week in September are listed. (Source: National Climatic Data Center) 


90 80 76 84 91 94 97 
16. The income groups for a sample of city residents are listed. 


Low Middle High 


Review Exercises 53 


17. The four departments of a printing company are listed. 
Administration Sales Production Billing 


18. The total compensations (in millions of dollars) of the ten highest-paid 
CEOs at U.S. public companies are listed. (Source: Equilar, Inc.) 


946 564 541 53.2 532 516 475 43.5 392 37.0 


Section 1.3 


In Exercises 19 and 20, determine whether the study is an observational study or 
an experiment. Explain. 


19. Researchers conduct a study to determine whether a drug used to treat 
hypertension in patients with obstructive sleep apnea works better when taken 
in the morning or in the evening. To perform the study, 78 patients are given one 
pill to take in the morning and one pill to take in the evening (one containing 
the drug and the other a placebo). After 6 weeks, researchers collected blood 
pressure information on the patients. (Source: American Thoracic Society) 


20. Researchers conduct a study to determine the effect of coffee consumption 
on the development of multiple sclerosis. To perform the study, researchers 
asked 4408 adults in Sweden and 2331 adults in the United States how 
many cups of coffee they drink per day. (Source: American Association for the 
Advancement of Science) 


In Exercises 21 and 22, two hundred students volunteer for an experiment to test 
the effects of sleep deprivation on memory recall. The students will be placed in one 
of five different treatment groups, including the control group. 


21. Explain how you could design an experiment so that it uses a randomized 
block design. 


22. Explain how you could design an experiment so that it uses a completely 
randomized design. 


In Exercises 23-28, identify the sampling technique used, and discuss potential 
sources of bias (if any). Explain. 


23. Using random digit dialing, researchers ask 1201 U.S. adults whether enough 
is being done to fight opioid addiction. (Source: Kaiser Family Foundation) 


24. A professor asks 20 students to participate in a student reaction survey. 


25. A study in a town in northwest Ethiopia designed to determine prevalence 
and predictors of depression among pregnant women randomly selects four 
districts of the town, then interviews all pregnant women in these districts. 
(Source: Public Library of Science) 


26. A researcher surveys every tenth house for average family incomes. 


27. Fifty voters are randomly selected from each religious group in a state and 
surveyed about their preferred political party. 


28. A government official surveys 150 students of a school in Shanghai to study 
the eating habits of school-going children in the city. 


29. Use the seventh row of Table 1 in Appendix B to generate 6 random 
numbers between 1 and 600. 


54 


CHAPTER 1 


1 Chapter Quiz 


Take this quiz as you would take a quiz in class. After you are done, check your 
work against the answers given in the back of the book. 


Introduction to Statistics 


1. A study of the dietary habits of 359,264 Korean adolescents was conducted 


to find a link between dietary habits and school performance. Identify the 
population and the sample in the study. (Source: Wolters Kluwer Health, Inc.) 


2. Determine whether each number describes a population parameter or a 


sample statistic. Explain your reasoning. 
(a) A survey of 1000 U.S. adults found that 52% think that the introduction 
of driverless cars will make roads less safe. (Source: Rasmussen Reports) 


(b) Ata college, 90% of the members of the Board of Trustees approved the 
contract of the new president. 


(c) Asurvey of 727 small business owners found that 25% reported job openings 
they could not fill. (Source: National Federation of Independent Business) 


3. Determine whether the data are qualitative or quantitative. Explain. 


(a) A list of debit card personal identification numbers 
(b) The final scores on a video game 


4. Determine the level of measurement of the data set. Explain your reasoning. 


(a) A list of badge numbers of police officers at a precinct 

(b) The horsepowers of racing car engines 

(c) The top 10 grossing films released in a year 

(d) The years of birth for the runners in the Boston marathon 


5. Determine whether the study is an observational study or an experiment. 


Explain. 


(a) Researchers conduct a study to determine whether body mass index 
(BMI) influences mortality. To conduct the study, researchers obtained 
the BMIs of 3,951,455 people. (Source: Elsevier, Ltd.) 


(b) Researchers conduct a study to determine whether taking a multivitamin 
daily affects cognitive health among men as they age. To perform the 
study, researchers studied 5947 male physicians ages 65 years or older 
and had one group take a multivitamin daily and had another group take 
a placebo daily. (Source: American College of Physicians) 


6. An experiment is performed to test the effects of a new drug on high blood 


pressure. The experimenter identifies 320 people ages 35-50 years old with 
high blood pressure for participation in the experiment. The subjects are 
divided into equal groups according to age. Within each group, subjects are 
then randomly selected to be in either the treatment group or the control 
group. What type of experimental design is being used for this experiment? 


7. Identify the sampling technique used in each study. Explain your reasoning. 


(a) A journalist asks people at a campground about air pollution. 

(b) For quality assurance, every tenth machine part is selected from an 
assembly line and measured for accuracy. 

(c) A study on attitudes about smoking is conducted at a college. The 
students are divided by class (freshman, sophomore, junior, and senior). 
Then a random sample is selected from each class and interviewed. 


8. Which technique used in Exercise 7 could lead to a biased study? Explain. 


Chapter Test 55 


2 Chapter Test 


Take this test as you would take a test in class. 

1. Determine whether you would take a census or use a sampling. If you would 
use a sampling, determine which sampling technique you would use. Explain. 
(a) The most popular type of investment among investors in New Jersey 
(b) The average age of the 30 employees of a company 


2. Determine whether each number describes a population parameter or a 
sample statistic. Explain. 


(a) A survey of 1003 U.S. adults ages 18 years and older found that 72% own 
a smartphone. (Source: Pew Research Center) 


(b) Inarecent year, the average evidence-based reading and writing score on 
the SAT was 543. (Source: The College Board) 


3. Identify the sampling technique used, and discuss potential sources of bias 
(if any). Explain. 
(a) Chosen at random, 200 male and 200 female high school students are 
asked about their plans after high school. 


(b) Chosen at random, 625 customers at an electronics store are contacted 
and asked their opinions of the service they received. 


(c) Questioning teachers as they leave a faculty lounge, a researcher asks 
45 of them about their teaching styles. 


4. Determine whether the data are qualitative or quantitative, and determine the 
level of measurement of the data set. Explain your reasoning. 


(a) The numbers of employees at fast-food restaurants in a city are listed. 


20 11 6 St 17 23 12 18 40 22 
13 8 18 14 37 32 25 27 25 18 


(b) The grade point averages (GPAs) for a class of students are listed. 


3.6 3.2 2.0 3.8 3.0 3.5 1.7 32 
2.2 4.0 2) 1.9 2.8 3.6 2:5 3.7 


5. Determine whether the survey question is biased. If the question is biased, 
suggest a better wording. 


(a) How many hours of sleep do you get on a normal night? 
(b) Do you agree that the town’s ban on skateboarding in parks is unfair? 


6. Researchers surveyed 19,183 U.S. physicians, asking for the information 
below. (Source: Medscape from WebMD) 
location (region of the U.S.) income (dollars) 
employment status (private practice or an employee) 
benefits received (health insurance, liability coverage, etc.) 
specialty (cardiology, family medicine, radiology, etc.) 
time spent seeing patients per week (hours) 


(a) Identify the population and the sample. 
(b) Is the data collected qualitative, quantitative, or both? Explain your reasoning. 
(c) Determine the level of measurement for each item above. 


(d) Determine whether the study is an observational study or an experiment. 
Explain. 


Putting it all together 


REAL DECISIONS 


You are a researcher for a professional research firm. Your firm has 
won a contract to conduct a study for a technology publication. The 
editors of the publication would like to know their readers’ thoughts on 
using smartphones for making and receiving payments, for redeeming 
coupons, and as tickets to events. They would also like to know 
whether people are interested in using smartphones as digital wallets 
that store data from their drivers’ licenses, health insurance cards, and 
other cards. 

The editors have given you their readership database and 20 
questions they would like to ask (two sample questions from a previous 
study are given at the right). You know that it is too expensive to 
contact all of the readers, so you need to determine a way to contact a 
representative sample of the entire readership population. 


EXERCISES 


1. How Would You Do It? 


(a) What sampling technique would you use to select the sample for 
the study? Why? 


(b) Will the technique you chose in part (a) give you a sample that 
is representative of the population? 


(c) Describe the method for collecting data. 
(d) Identify possible flaws or biases in your study. 


2. Data Classification 


(a) What type of data do you expect to collect: qualitative, 
y p q 
quantitative, or both? Why? 


(b) At what levels of measurement do you think the data in the 
study will be? Why? 


(c) Will the data collected for the study represent a population or 
a sample? 

(d) Will the numerical descriptions of the data be parameters or 
statistics? 

3. How They Did It 

When The Harris Poll did a similar study, they used an Internet 

survey. 

(a) Describe some possible errors in collecting data by Internet 
surveys. 

(b) Compare your method for collecting data in Exercise 1 to this 
method. 


56 CHAPTER 1 Introduction to Statistics 


When do you think smartphone 
payments will replace payment card 
transactions for a majority of purchases? 


Within the next year 2% 


1 year to less 
than 3 years 
10% 


3 years 
to less than 
5 years 
18% 
10 years or more 5 years to less than 
14% 10 years 
19% 


(Source: The Harris Poll) 


How interested are you in being able to 
use your smartphone to make payments, 
rather than using cash or payment cards? 


Not at all sure 15% 


Very interested 
9% 


Not at all 
interested 
44% 


Somewhat 
interested 
18% 
Not very interested 

14% 


(Source: The Harris Poll) 


HISTORY OF STATISTICS-TIMELINE 


17TH CENTURY 
John Graunt (1620-1674) 


Blaise Pascal (1623-1662) 
Pierre de Fermat (1601-1665) 


18TH CENTURY 


Pierre Laplace 
(1749-1827) 


> Carl Friedrich 
Gauss (1777-1855) 


19TH CENTURY 
Lambert Quetelet 
(1796-1874) 


< Florence Nightingale 
(1820-1910) 


Francis Galton 
(1822-1911) 


20TH CENTURY 
Karl Pearson (1857-1936) 


William Gosset (1876-1937) 


Charles Spearman (1863-1945) 


Ronald Fisher (1890-1962) 


20TH CENTURY (later) 
Frank Wilcoxon (1892-1965) 


< John Tukey 
(1915-2000) 


David Kendall 
(1918-2007) 


Studied records of deaths in London in the early 1600s. The first to make 
extensive statistical observations from massive amounts of data (Chapter 2), 
his work laid the foundation for modern statistics. 


Pascal and Fermat corresponded about basic probability problems 
(Chapter 3)—especially those dealing with gaming and gambling. 


Studied probability (Chapter 3) and is credited with putting probability on a 
sure mathematical footing. 


Studied regression and the method of least squares (Chapter 9) through 
astronomy. In his honor, the normal distribution (Chapter 5) is sometimes 
called the Gaussian distribution. 


Used descriptive statistics (Chapter 2) to analyze crime and mortality data 
and studied census techniques. Described normal distributions (Chapter 5) 
in connection with human traits such as height. 


A nurse during the Crimean War, she was one of the first to advocate the 
importance of sanitation in hospitals. One of the first statisticians to use 
descriptive statistics (Chapter 2) as a way to argue for social change and 
credited with having developed the Coxcomb chart. 


Used regression and correlation (Chapter 9) to study genetic variation in 
humans. He is credited with the discovery of the Central Limit Theorem 
(Chapter 5). 


Studied natural selection using correlation (Chapter 9). Formed first academic 
department of statistics and helped develop chi-square analysis (Chapter 6). 


Studied process of brewing and developed t-test to correct problems connected 
with small sample sizes (Chapter 6). 


British psychologist who was one of the first to develop intelligence testing using 
factor analysis (Chapter 10). 


Studied biology and natural selection and developed ANOVA (Chapter 10), 
stressed the importance of experimental design (Chapter 1), and was the first to 
identify the null and alternative hypotheses (Chapter 7). 


Biochemist who used statistics to study plant pathology. He introduced 
two-sample tests (Chapter 8), which led the way to the development of 
nonparametric statistics. 


Worked at Princeton during World War IL. Introduced exploratory data analysis 
techniques such as stem-and-leaf plots (Chapter 2). Also, worked at Bell 
Laboratories and is best known for his work in inferential statistics (Chapters 6-11). 


Worked at Princeton and Cambridge. Was a leading authority on applied 
probability and data analysis (Chapters 2 and 3). 


History of Statistics 57 


TECHNOLOGY 


Using Technology in Statistics 


With large data sets, you will find that calculators or computer software 
programs can help perform calculations and create graphics. These 
calculations can be performed on many calculators and statistical 
software programs, such as Minitab, Excel, and the TI-84 Plus. 

The following example shows a sample generated by each of these 
three technologies to generate a list of random numbers. This list of 
random numbers can be used to select sample members or perform 
simulations. 


Generating a List of Random Numbers 


A quality control department inspects a random sample of 15 of the 
167 cars that are assembled at an auto plant. How should the cars be 
chosen? 


SOLUTION 


One way to choose the sample is to first number the cars from 1 to 167. 
Then you can use technology to form a list of random numbers from 
1 to 167. Each of the technology tools shown requires different steps 
to generate the list. Each, however, does require that you identify the 
minimum value as 1 and the maximum value as 167. Check your user’s 


manual for specific instructions. 
TI-84 PLUS 


randint (1, 167, 15) 

(7 42 (e2 SS 8 116 
125 (64! 122 we biel EO 
S2 162 0S) 


58 CHAPTER 1 Introduction to Statistics 


1. 


Recall that when you generate a list of random numbers, you 
should decide whether it is acceptable to have numbers that repeat. If it 
is acceptable, then the sampling process is said to be with replacement. 
If it is not acceptable, then the sampling process is said to be without 
replacement. 
With each of the three technology tools shown on page 58, you have 
the capability of sorting the list so that the numbers appear in order. 
Sorting helps you see whether any of the numbers in the list repeat. 
If it is not acceptable to have repeats, you should specify that the tool 
generate more random numbers than you need. 


The SEC (Securities and Exchange Commission) is 
investigating a financial services company. The company 
being investigated has 86 brokers. The SEC decides to 
review the records for a random sample of 10 brokers. 
Describe how this investigation could be done. Then 
use technology to generate a list of 10 random numbers 


5. Use random numbers to simulate rolling a six-sided die 


60 times. How many times did you obtain each number 
from 1 to 6? Are the results what you expected? 


6. You rolled a six-sided die 60 times and got the 


following tally. 


from 1 to 86 and order the list. a0ones 
; ; 20 twos 
2. A quality control department is testing 25 smartphones 15 threes 
from a shipment of 300 smartphones. Describe how 3 fours 
this test could be done. Then use technology to 2 fives 
generate a list of 25 random numbers from 1 to 300 0 sixes 


and order the list. 


. Consider the population of ten digits: 0, 1, 2, 3, 4, 5, 


6, 7, 8, and 9. Select three random samples of five 
digits from this list. Find the average of each sample. 
Compare your results with the average of the entire 
population. Comment on your results. (Hint: To find 
the average, sum the data entries and divide the sum 
by the number of entries.) 


. Consider the population of 41 whole numbers from 0 


to 40. What is the average of these numbers? Select 
three random samples of seven numbers from this list. 
Find the average of each sample. Compare your results 
with the average of the entire population. Comment on 
your results. (Hint: To find the average, sum the data 
entries and divide the sum by the number of entries.) 


Does this seem like a reasonable result? What inference 
might you draw from the result? 


7. Use random numbers to simulate tossing a coin 100 


times. Let 0 represent heads, and let 1 represent tails. 
How many times did you obtain each number? Are the 
results what you expected? 


8. You tossed a coin 100 times and got 77 heads and 23 


tails. Does this seem like a reasonable result? What 
inference might you draw from the result? 


9. A political analyst would like to survey a sample of 


the registered voters in a county. The county has 47 
election districts. How could the analyst use random 
numbers to obtain a cluster sample? 


Extended solutions are given in the technology manuals that accompany this text. 
Technical instruction is provided for Minitab, Excel, and the TI-84 Plus. 


Technology 59 


60 


Heseriptive Statistics 


Since the 1966 season, the National Football League has determined its champion in the 
Super Bowl. The winning team receives the Lombardi Trophy. 


2.1 


2.2 


2.3 
2.4 


Activity 
Case Study 


2.5 


Uses and Abuses 
Real Statistics—Real Decisions 
Technology 


wy Where You ve Been 


In Chapter 1, you learned that there are many ways to 
collect data. Usually, researchers must work with sample 
data in order to analyze populations, but occasionally it 
is possible to collect all the data for a given population. 
For instance, the data at the right represents the points 
scored by the winning teams in the first 51 Super Bowls. 
(Source: NFL.com) 


by, Where You re Going 


35, 33, 16, 23, 16, 24, 14, 24, 16, 21, 32, 27, 35, 31, 27, 26, 27, 
38, 38, 46, 39, 42, 20, 55, 20, 37, 52, 30, 49, 27, 35, 31, 34, 23, 
34, 20, 48, 32, 24, 21, 29, 17, 27, 31, 31, 21, 34, 43, 28, 24, 34 


In Chapter 2, you will learn ways to organize and describe 
data sets. The goal is to make the data easier to understand 
by describing trends, averages, and variations. For instance, 
in the raw data showing the points scored by the winning 


} 


Make a frequency distribution. 


Class Frequency, f 
14-19 5 
20-25 12 
26-31 13 
32-37 11 
38-43 

44-49 

50-55 2 


Mean 


_ 35 + 33 + 164+ 23 + 16+.° 


teams in the first 51 Super Bowls, it is not easy to see any 
patterns or special characteristics. Here are some ways you 
can organize and describe the data. 


Draw a histogram. * 


Frequency 


> 


1 
13.5 19.5 25.5 31.5 375 43.5 49.5 55.5 


Points 


-+ 43 +28 + 24+ 34 


1541 

ot 

30.2 points 
a 

Range = 55 — 14 


N 


51 


Find an average. 


= 41 points Find how the data vary. 


We 


61 


62 CHAPTER 2. Descriptive Statistics 


2.1 


What You Should Learn 


» How to construct a frequency 
distribution, including limits, 
midpoints, relative frequencies, 
cumulative frequencies, and 
boundaries 


~ How to construct frequency 
histograms, frequency 
polygons, relative frequency 
histograms, and ogives 


Example of a 
Frequency Distribution 


Class Frequency, f 

1-5 5 
6-10 8 
11-15 6 
16-20 8 
21-25 5 
4 


26-30 


Study Tip 


In general, the frequency 
distributions shown in this 
text will use the minimum 
data entry for the lower 
limit of the first class. 
Sometimes it may be more 
convenient to choose a lower limit 
that is slightly less than the minimum 
data entry. The frequency distribution 
produced will vary slightly. 


Frequency Distributions and Their Graphs 


Frequency Distributions m Graphs of Frequency Distributions 


Frequency Distributions 


There are many ways to organize and describe a data set. Important characteristics 
to look for when organizing and describing a data set are its center, its variability 
(or spread), and its shape. Measures of center and shapes of distributions are 
covered in Section 2.3. Measures of variability are covered in Section 2.4. 

When a data set has many entries, it can be difficult to see patterns. In 
this section, you will learn how to organize data sets by grouping the data into 
intervals called classes and forming a frequency distribution. You will also learn 
how to use frequency distributions to construct graphs. 


DEFINITION 


A frequency distribution is a table that shows classes or intervals of data 


entries with a count of the number of entries in each class. The frequency f of 
a class is the number of data entries in the class. 


In the frequency distribution shown at the left, there are six classes. The 
frequencies for each of the six classes are 5, 8, 6, 8, 5, and 4. Each class has a 
lower class limit, which is the least number that can belong to the class, and an 
upper class limit, which is the greatest number that can belong to the class. In 
the frequency distribution shown, the lower class limits are 1, 6, 11, 16, 21, and 
26, and the upper class limits are 5, 10, 15, 20, 25, and 30. The class width is the 
distance between lower (or upper) limits of consecutive classes. For instance, 
the class width in the frequency distribution shown is 6 — 1 = 5. Notice that the 
classes do not overlap. 

The difference between the maximum and minimum data entries is called 
the range. In the frequency table shown, suppose the maximum data entry is 29, 
and the minimum data entry is 1. The range then is 29 — 1 = 28. You will learn 
more about the range of a data set in Section 2.4. 


GUIDELINES 


Constructing a Frequency Distribution from a Data Set 


1. Decide on the number of classes to include in the frequency distribution. 
The number of classes should be between 5 and 20; otherwise, it may be 
difficult to detect any patterns. 


. Find the class width as follows. Determine the range of the data, divide 
the range by the number of classes, and round up to the next convenient 
number. 


. Find the class limits. You can use the minimum data entry as the lower 
limit of the first class. To find the remaining lower limits, add the class 
width to the lower limit of the preceding class. Then find the upper 
limit of the first class. Remember that classes cannot overlap. Find the 
remaining upper class limits. 


. Make a tally mark for each data entry in the row of the appropriate class. 


. Count the tally marks to find the total frequency f for each class. 


“4. Study Tip 


If you obtain a whole 
number when calculating 
the class width of a 
frequency distribution, 
use the next whole 
number as the class 
width. Doing this ensures that you 
will have enough space in your 
frequency distribution for all the 

data entries. 


Lower limit | Upper limit 


155 190 
191 226 
227 262 
263 298 
299 334 
335 370 
371 406 


Study Tip 


The uppercase Greek 
letter sigma (>) is 
used throughout statistics to 

indicate a summation of values. 


SECTION 2.1 Frequency Distributions and Their Graphs 63 


Constructing a Frequency Distribution from a Data Set 


The data set lists the out-of-pocket prescription medicine expenses (in dollars) 
for 30 U.S. adults in a recent year. Construct a frequency distribution that has 
seven classes. (Adapted from: Health, United States, 2015) 


200 239 155 252 384 165 296 405 303 400 
307 241 256 315 330 317 352 266 276 345 
238 306 290 271 345 312 293 195 168 342 


SOLUTION 
1. The number of classes (7) is stated in the problem. 


2. The minimum data entry is 155 and the maximum data entry is 405, so the 
range is 405 — 155 = 250. Divide the range by the number of classes and 
round up to find the class width. 


: 250 Range 
Ces wien 7 Number of classes 
= 35.71 Round up to the next convenient number, 36. 


3. The minimum data entry is a convenient lower limit for the first class. To 
find the lower limits of the remaining six classes, add the class width of 36 to 
the lower limit of each previous class. So, the lower limits of the other classes 
are 155 + 36 = 191,191 + 36 = 227, and so on. The upper limit of the first 
class is 190, which is one less than the lower limit of the second class. The 
upper limits of the other classes are 190 + 36 = 226, 226 + 36 = 262, and 
so on. The lower and upper limits for all seven classes are shown at the left. 


4. Make a tally mark for each data entry in the appropriate class. For instance, 
the data entry 168 is in the 155-190 class, so make a tally mark in that class. 
Continue until you have made a tally mark for each of the 30 data entries. 


5. The number of tally marks for a class is the frequency of that class. 


The frequency distribution is shown below. The first class, 155-190, has three 
tally marks. So, the frequency of this class is 3. Notice that the sum of the 
frequencies is 30, which is the number of entries in the data set. The sum is 
denoted by {f where & is the uppercase Greek letter sigma. 


Frequency Distribution for Out-of-Pocket 
Prescription Medicine Expenses (in dollars) 


Expensés. A yx Number of 
adults 
Class Tally Frequency, f 

155-190 3 

191-226 2 

227-262 | I 5 

263-298 | 6 

299-334 Jif || 7 

335-370 | 4 Check that the sum 

371-406 3 of the frequencies 
Sf = 30 equals the number 


in the sample. 


ae 


64 CHAPTER 2. Descriptive Statistics 


TRY IT YOURSELF 1 
Construct a frequency distribution using the points scored by the 51 winning 


teams listed on page 61. Use six classes. Answer: Page A31 

Population of Iowa Note in Example 1 that the classes do not overlap, so each of the original 

data entries belongs to exactly one class. Also, the classes are of equal width. In 

Ages Frequency general, all classes in a frequency distribution have the same width. However, 

0-9 399,859 this may not always be possible because a class can be open-ended. For instance, 

10-19 424,850 the frequency distribution for the population of Iowa shown at the left has an 
20-29 412.354 open-ended class, “80 and older.” 

: After constructing a standard frequency distribution such as the one in 
ied aBI302 Example 1, you can include several additional features that will help provide 
40-49 368,620 a better understanding of the data. These features (the midpoint, relative 
50-59 421,726 frequency, and cumulative frequency of each class) can be included as additional 
60-69 356,124 columns in your table. 

70-79 203,053 


80 and older 143,699 DEFINITION 


The laciclase OO and The midpoint of a class is the sum of the lower and upper limits of the class 
older, is open-ended. divided by two. The midpoint is sometimes called the class mark. 


a (Lower class limit) + (Upper class limit) 


2 


Midpoint = 


The relative frequency of a class is the portion, or percentage, of the data 
that falls in that class. To find the relative frequency of a class, divide the 
frequency f by the sample size n. 


Class frequency ff 


Relative frequency = Note that n = Xf. 


Sample size n 


The cumulative frequency of a class is the sum of the frequencies of that class 
and all previous classes. The cumulative frequency of the last class is equal 
to the sample size n. 


You can use the formula shown above to find the midpoint of each class, or 
after finding the first midpoint, you can find the remaining midpoints by adding 
the class width to the previous midpoint. For instance, the midpoint of the first 
class in Example 1 is 


155 + 190 
Midpoint = —— 172.5. Midpoint of first class. 


Using the class width of 36, the remaining midpoints are 


172.5 + 36 = 208.5 Midpoint of second class. 

208.5 + 36 = 244.5 Midpoint of third class. 

244.5 + 36 = 280.5 Midpoint of fourth class. 
and so on. 


You can write the relative frequency as a fraction, decimal, or percent. The 
sum of the relative frequencies of all the classes should be equal to 1, or 100%. 
Due to rounding, the sum may be slightly less than or greater than 1. So, values 
such as 0.99 and 1.01 are sufficient. 


SECTION 2.1. Frequency Distributions and Their Graphs 65 


Finding Midpoints, Relative Frequencies, and Cumulative 
Frequencies 


Using the frequency distribution constructed in Example 1, find the midpoint, 
relative frequency, and cumulative frequency of each class. Describe any 
patterns. 


SOLUTION 


The midpoints, relative frequencies, and cumulative frequencies of the first 
five classes are calculated as follows. 


Relative Cumulative 
Class if Midpoint frequency frequency 
155-190 3 p00 DD 2.4555 = i 3 
2 30 
191-226 2, wie es = 208.5 = = 0.07 362 ='5 
2 30 
227-262 5 gat ee = 244.5 as = 0.17 $+5=10 
2 30 
263-298 6 Rae = 280.5 a = 0.2 10+ 6= 16 
2 30 
299-334 7 { = 316.5 = ~ 0.23 16+7=23 


The remaining midpoints, relative frequencies, and cumulative frequencies are 
shown in the expanded frequency distribution below. 


Frequency Distribution for Out-of-Pocket 
Prescription Medicine Expenses (in dollars) 


Expenses 
Frequency, Relative | Cumulative 
Number | Class_| _~ if Midpoint frequency. frequency = portion 
adults | 155-190 3 172.5 01 3 eeaure 
191-226 2 208.5 0.07 5 
227-262 5 244.5 0.17 10 
263-298 6 280.5 0.2 16 
299-334 i. 316.5 0.23 23 
335-370 4 352.5 0.13 27 
371-406 3 388.5 0.1 30 
>f = 30 sf =1 


Interpretation There are several patterns in the data set. For instance, the 
most common range for the expenses is $299 to $334. Also, about half of the 
expenses are less than $299. 


TRY IT YOURSELF 2 


Using the frequency distribution constructed in Try It Yourself 1, find the 
midpoint, relative frequency, and cumulative frequency of each class. Describe 
any patterns. Answer: Page A31 


66 


Class 


155-190 
191-226 
227-262 
263-298 
299-334 
335-370 
371-406 


CHAPTER 2 _ Descriptive Statistics 


Class Frequency, 
boundaries f 
154.5-190.5 
190.5-226.5 
226.5-262.5 
262.5-298.5 
298.5-334.5 
334.5-370.5 
370.5—406.5 


WORN DN YN W 


Study Tip 
It is customary in bar 
graphs to have spaces 
between the bars, 
whereas with histograms, 
it is customary that the 
bars have no spaces 
between them. 


Graphs of Frequency Distributions 


Sometimes it is easier to discover patterns in a data set by looking at a graph of 
the frequency distribution. One such graph is a frequency histogram. 


DEFINITION 


A frequency histogram uses bars to represent the frequency distribution of a 
data set. A histogram has the following properties. 


1. The horizontal scale is quantitative and measures the data entries. 
2. The vertical scale measures the frequencies of the classes. 
3. Consecutive bars must touch. 


Because consecutive bars of a histogram must touch, bars must begin and 
end at class boundaries instead of class limits. Class boundaries are the numbers 
that separate classes without forming gaps between them. For data that are 
integers, subtract 0.5 from each lower limit to find the lower class boundaries. To 
find the upper class boundaries, add 0.5 to each upper limit. The upper boundary 
of a class will equal the lower boundary of the next higher class. 


Constructing a Frequency Histogram 


Draw a frequency histogram for the frequency distribution in Example 2. 
Describe any patterns. 


SOLUTION 


First, find the class boundaries. Because the data entries are integers, subtract 
0.5 from each lower limit to find the lower class boundaries and add 0.5 to 
each upper limit to find the upper class boundaries. So, the lower and upper 
boundaries of the first class are as follows. 


First class lower boundary = 155 — 0.5 = 154.5 
First class upper boundary = 190 + 0.5 = 190.5 


The boundaries of the remaining classes are shown in the table at the left. 
To construct the histogram, choose possible frequency values for the vertical 
scale. You can mark the horizontal scale either at the midpoints or at the class 
boundaries. Both histograms are shown below. 


Out-of-Pocket Prescription 
Medicine Expenses 
(labeled with class midpoints) 


Out-of-Pocket Prescription 
Medicine Expenses 
(labeled with class boundries) 


Frequency 
(number of adults) 
a 
1 
Frequency 
(number of adults) 


s 


S S ~o 2o 2S 79 9 
52 Oo a? 99 99 of 
PPT gh eh GP” gh” KP 


De? Po? OP nm? 2? 
SEPT BRE GPE gh a gh 


Broken axis 


Expense (in dollars) Expense (in dollars) 


Interpretation From either histogram, you can see that two-thirds of the 
adults are paying more than $262.50 for out-of-pocket prescription medicine 
expenses. 


SECTION 2.1. Frequency Distributions and Their Graphs 67 


TRY IT YOURSELF 3 


Use the frequency distribution from Try It Yourself 2 to construct a frequency 
histogram that represents the points scored by the 51 winning teams listed on 
page 61. Describe any patterns. Answer: Page A32 


Another way to graph a frequency distribution is to use a frequency polygon. 
A frequency polygon is a line graph that emphasizes the continuous change in 
frequencies. 


Constructing a Frequency Polygon 


Draw a frequency polygon for the frequency distribution in Example 2. 
Describe any patterns. 


SOLUTION 


To construct the frequency polygon, use the same horizontal and vertical scales 
that were used in the histogram labeled with class midpoints in Example 3. 
Then plot points that represent the midpoint and frequency of each class and 
connect the points in order from left to right with line segments. Because the 
graph should begin and end on the horizontal axis, extend the left side to one 
class width before the first class midpoint and extend the right side to one class 
width after the last class midpoint. 


Out-of-Pocket Prescription Medicine Expenses 


A 
7+ 


Frequency (number of adults) 
a= 
t 


> 


T T T T 
136.5 172.5 208.5 244.5 280.5 316.5 352.5 388.5 424.5 


Expense (in dollars) 


You can check your answer using technology, as shown below. 


TI-84 PLUS 


Interpretation You can see that the frequency of adults increases up to an 
expense of $316.50 and then the frequency decreases. 


TRY IT YOURSELF 4 


Use the frequency distribution from Try It Yourself 2 to construct a frequency 
polygon that represents the points scored by the 51 winning teams listed on 
page 61. Describe any patterns. Answer: Page A32 


68 CHAPTER 2. Descriptive Statistics 


A histogram and its corresponding frequency polygon are often drawn 
together, as shown at the left using Excel. To do this by hand, first, construct 
the frequency polygon by choosing appropriate horizontal and vertical scales. 
The horizontal scale should consist of the class midpoints, and the vertical 
scale should consist of appropriate frequency values. Then plot the points that 
represent the midpoint and frequency of each class. After connecting the points 
with line segments, finish by drawing the bars for the histogram. 

A relative frequency histogram has the same shape and the same horizontal 
scale as the corresponding frequency histogram. The difference is that the 
vertical scale measures the relative frequencies, not frequencies. 


Frequency 


136.5 172.5 208.5 244.5 280.5 316.5 352.5 388.5 424.5 


Expense (in dollars) 


QoSaN 
Mee 
I. K > 


eee) Pleturing 
the World 


Old Faithful, a geyser at 
Yellowstone National Park, erupts 
on a regular basis. The time spans 
of a sample of eruptions are 
shown in the relative frequency 
histogram. (Source: Yellowstone 
National Park) 


Constructing a Relative Frequency Histogram 


Draw a relative frequency histogram for the frequency distribution in 
Example 2. 


SOLUTION 


The relative frequency histogram is shown. Notice that the shape of the 
histogram is the same as the shape of the frequency histogram constructed in 
Example 3. The only difference is that the vertical scale measures the relative 
frequencies. 


Old Faithful Eruptions 
A 


Out-of-Pocket Prescription Medicine Expenses 
A 


0:25'4- 


= 

7 

So 
i 


0.30 -- 0.20 5 


0.20 -- 0.15 5 


0.10 =- 


Relative frequency 
Relative frequency 
(portion of adults) 


0.05 

2.0 2.6 3.2 3.8 44 

Duration of eruption 
(in minutes) 


154.5 190.5 226.5 262.5 2985 3345 370.5 406.5 


Expense (in dollars) 


About 50% of the eruptions last 


less than how many minutes? Interpretation From this graph, you can quickly see that 0.2, or 20%, of the 


adults have expenses between $262.50 and $298.50, which is not immediately 
obvious from the frequency histogram in Example 3. 


TRY IT YOURSELF 5 


Use the frequency distribution in Try It Yourself 2 to construct a relative 
frequency histogram that represents the points scored by the 51 winning teams 
listed on page 61. 

Answer: Page A32 


To describe the number of data entries that are less than or equal to a certain 
value, construct a cumulative frequency graph. 


DEFINITION 


A cumulative frequency graph, or ogive (pronounced O’jive), is a line 


graph that displays the cumulative frequency of each class at its upper class 
boundary. The upper boundaries are marked on the horizontal axis, and the 
cumulative frequencies are marked on the vertical axis. 


SECTION 2.1. Frequency Distributions and Their Graphs 69 


GUIDELINES 


Constructing an Ogive (Cumulative Frequency Graph) 


1. Construct a frequency distribution that includes cumulative frequencies 
as one of the columns. 


. Specify the horizontal and vertical scales. The horizontal scale consists 
of upper class boundaries, and the vertical scale measures cumulative 
frequencies. 


. Plot points that represent the upper class boundaries and their 
corresponding cumulative frequencies. 


. Connect the points in order from left to right with line segments. 


. The graph should start at the lower boundary of the first class 
(cumulative frequency is 0) and should end at the upper boundary of the 
last class (cumulative frequency is equal to the sample size). 


Constructing an Ogive 
Draw an ogive for the frequency distribution in Example 2. 


SOLUTION 
: Using the cumulative frequencies, you can construct the ogive shown. The 
Upper class Cumulative . i f : 3 
upper class boundaries, frequencies, and cumulative frequencies are shown in 
boundary f _ frequency . : 
the table. Notice that the graph starts at 154.5, where the cumulative frequency 
190.5 3 3 is 0, and the graph ends at 406.5, where the cumulative frequency is 30. 
226.5 2 5 
262.5 5 10 bed lace Prescription Medicine Expenses 
298.5 6 16 30 + 
334.5 7 23 ome 
370.5 4 27 ral 
406.5 3 30 


Cumulative frequency 
(number of adults) 


| | | | | | }—> 
T T T T T 
154.5 190.5 226.5 262.5 298.5 3345 370.5 406.5 


Expense (in dollars) 


Interpretation From the ogive, you can see that 10 adults had expenses of 
$262.50 or less. Also, the greatest increase in cumulative frequency occurs 
between $298.50 and $334.50 because the line segment is steepest between 
these two class boundaries. 


TRY IT YOURSELF 6 


Use the frequency distribution from Try It Yourself 2 to construct an ogive 
that represents the points scored by the 51 winning teams listed on page 61. 
Answer: Page A32 


Another type of ogive uses percent as the vertical axis instead of frequency 
(see Example 5 in Section 2.5). 


70 CHAPTER 2. Descriptive Statistics 


Tech Tip 


You can use technology 
such as Minitab, Excel, 


StatCrunch, or the 
TI-84 Plus to create a 
histogram. (Detailed 
instructions for using 
Minitab, Excel, and the TI-84 Plus 
are shown in the technology 
manuals that accompany this text.) 
For instance, here are instructions 
for creating a histogram on a 
TI-84 Plus. 


STAT |}|ENTER 


Enter midpoints in L1. 
Enter frequencies in L2. 


2nd | STAT PLOT 


Turn on Plot 1. 
Highlight Histogram. 
Xlist: L1 

Freq: L2 
ZOOM |/9 


If you have access to technology such as Minitab, Excel, StatCrunch, or the 


TI-84 Plus, you can use it to draw the graphs discussed in this section. 


Using Technology to Construct Histograms 


Use technology to construct a histogram for the frequency distribution in 
Example 2. 


SOLUTION 


Using the instructions for a TI-84 Plus shown in the Tech Tip at the left, you 
can draw a histogram similar to the one below on the left. To investigate the 
graph, you can use the trace feature. After pressing | TRACE} the midpoint 
and the frequency of the first class are displayed, as shown in the figure on the 
right. Use the right and left arrow keys to move through each bar. 


TI-84 PLUS TI-84 PLUS 


Frequency 
Frequency 


172.5 208.5 244.5 280.5 316.5 352.5 388.5 
172.5 208.5 244.5 280.5 316.5 352.5 388.5 


Expense (in dollars) Expense (in dollars) 


STATCRUNCH 


155 191 227 263 299 335 371 407 
Expense (in dollars) 


TRY IT YOURSELF 7 


Use technology and the frequency distribution from Try It Yourself 2 to 
construct a frequency histogram that represents the points scored by the 
51 winning teams listed on page 61. 

Answer: Page A32 


SECTION 2.1. Frequency Distributions and Their Graphs 71 


2.1 EXERCISES For Extra Hep: MyLab Statistics 


Building Basic Skills and Vocabulary 


1. What are some benefits of representing data sets using frequency distributions? 
What are some benefits of using graphs of frequency distributions? 


2. Why should the number of classes in a frequency distribution be between 
5 and 20? 


3. What is the difference between class limits and class boundaries? 
4. What is the difference between relative frequency and cumulative frequency? 


5. After constructing an expanded frequency distribution, what should the sum 
of the relative frequencies be? Explain. 


6. What is the difference between a frequency polygon and an ogive? 


True or False? Jn Exercises 7-10, determine whether the statement is true or 
false. If it is false, rewrite it as a true statement. 


7. Ina frequency distribution, the class width is the distance between the lower 
and upper limits of a class. 


8. The midpoint of a class is the sum of its lower and upper limits divided by two. 
9. An ogive is a graph that displays relative frequencies. 


10. Class boundaries ensure that consecutive bars of a histogram touch. 


In Exercises 11-14, use the minimum and maximum data entries and the number 
of classes to find the class width, the lower class limits, and the upper class limits. 


11. min = 9, max = 64, 7 classes 12. min = 12, max = 88, 6 classes 


13. min = 17, max = 135, 8 classes 14. min = 54, max = 247, 10 classes 


Reading a Frequency Distribution Jn Exercises 15 and 16, use the 
frequency distribution to find the (a) class width, (b) class midpoints, and 
(c) class boundaries. 


15. Travel Time to Work 16. Toledo, OH, Average 
(in minutes) Normal Temperatures (°F) 
Class Frequency, f Class Frequency, f 
0-10 188 25-32 86 
11-21 372 33-40 39 
22-32 264 41-48 41 
33-43 205 49-56 48 
44-54 83 57-64 43 
55-65 76 65-72 68 
66-76 32 73-80 40 


17. Use the frequency distribution in Exercise 15 to construct an expanded 
frequency distribution, as shown in Example 2. 


18. Use the frequency distribution in Exercise 16 to construct an expanded 
frequency distribution, as shown in Example 2. 


72 CHAPTER 2. Descriptive Statistics 
Graphical Analysis Jn Exercises 19 and 20, use the frequency histogram to 
(a) determine the number of classes. 
(b) estimate the greatest and least frequencies. 
(c) determine the class width. 


(d) describe any patterns with the data. 


19. Employee Salaries 20. Roller Coaster Heights 
A A 

40 + 
35+ 
30 +— 
25 4— 
20 +— 
15 +— 
104 

5 oan 


Frequency 
Frequency 


Height (in feet) 


Graphical Analysis Jn Exercises 21 and 22, use the frequency polygon to 
identify the class with the greatest, and the class with the least, frequency. 


21. MCAT Scores 22. Commuting Distances for 
for 90 Applicants 70 Students, Ages 18-24 


Frequency 
Frequency 


1 
T 
123 4 567 8 9 10 
Distance (miles) 


Graphical Analysis Jn Exercises 23 and 24, use the relative frequency 
histogram to 


(a) identify the class with the greatest, and the class with the least, relative 
frequency. 

(b) approximate the greatest and least relative frequencies. 

(c) describe any patterns with the data. 


23. Female Fibula Lengths 24. Campus Security 
i Response Times 


Relative frequency 


Relative frequency 


nnnnn nn nnn 
Sram tus Gaon 
Am AA A NM ATM H | 

10.5 115 125 13.5 145 


Time (in minutes) 


Length (in centimeters) 


SECTION 2.1. Frequency Distributions and Their Graphs 73 


Graphical Analysis Jn Exercises 25 and 26, use the ogive to approximate 
(a) the number in the sample. 


(b) the location of the greatest increase in frequency. 


25. Black Bears 26. Adult Males 


Cumulative frequency 
Cumulative frequency 


fi 

T_T 
Vou al 
iw 


72.5 
158.5 -- 
330.5 4- 
373.5 + 
416.5 + 
459.5 -- 


i 
T 
wy 
wo 
a 99 
a aa 


Height (in inches) 


Weight (in pounds) 


27. Use the ogive in Exercise 25 to approximate 
(a) the cumulative frequency for a weight of 201.5 pounds. 
(b) the weight for which the cumulative frequency is 68. 


(c) the number of black bears that weigh between 158.5 pounds and 
244.5 pounds. 


(d) the number of black bears that weigh more than 330.5 pounds. 


28. Use the ogive in Exercise 26 to approximate 
(a) the cumulative frequency for a height of 72 inches. 
(b) the height for which the cumulative frequency is 15. 
(c) the number of adult males that are between 68 and 72 inches tall. 
(d) the number of adult males that are taller than 70 inches. 


Using and Interpreting Concepts 


Constructing a Frequency Distribution Jn Exercises 29 and 30, 
construct a frequency distribution for the data set using the indicated number of 
classes. In the table, include the midpoints, relative frequencies, and cumulative 


frequencies. Which class has the greatest class frequency and which has the least 
class frequency? 


ad} 29. Online Games Playing Times 
Number of classes: 5 
Data set: Times (in minutes) spent playing online games in a day 


12 20 30 25 16 13 8 15 19 12 18 13 
21 16 18 12 13 25 20 21 13 18 12 14 


BG 30. Conveyance Spending 
Number of classes: 6 
Data set: Amounts (in dollars) spent on conveyance for a quarter 
of a year 


425 413 318 325 S15 213 418 313 410 415 
513. 521 232 320 528 412 313 425 402 498 
321 213 312 514 415 541 451 328 382 238 


Be indicates that the data set for this exercise is available within MyStatLab 
or at www.pearsonglobaleditions.com. 


CHAPTER 2 _ Descriptive Statistics 


Constructing a Frequency Distribution and a Frequency Histogram 
In Exercises 31-34, construct a frequency distribution and a frequency histogram 
for the data set using the indicated number of classes. Describe any patterns. 


eB 31. Production 


Number of classes: 6 
Data set: January production (in units) for 21 manufacturing plants of 
a multinational company 


1254 1248 2415 2697 1698 1387 985 
2034 2169 1478 1312 2307 2802 2011 
2804 1695 1489 1908 2707 1308 1566 


eB 32. Acid Strengths 
Number of classes: 5 


Data set: Strengths (in parts per thousands) of 24 acids 


57 96 99 90 38 49 70 71 61 86 75 38 
50 45 58 94 98 86 42 48 63 81 87 44 


eB 33. Response Times 
Number of classes: 8 


Data set: Response times (in days) of 30 males to a test drive survey 


3 5 4 6 18 15 23 4 6 9 
6549 3 5 14 2 17 5 
63 49 6 7 10 15 11 3 


34. Bowling Speeds 
Number of classes: 8 
Data set: Bowling speeds (in kilometers per hour) of 21 bowlers in a 
cricket series 


128 130 155 142 161 111 121 
100 105 125 162 118 133 135 
142 128 129 136 145 161 129 


Q 


Constructing a Frequency Distribution and a Frequency Polygon 
In Exercises 35 and 36, construct a frequency distribution and a frequency polygon 
for the data set using the indicated number of classes. Describe any patterns. 


Be 35. Ages of the Presidents 
Number of classes: 6 
Data set: Ages of the U.S. presidents at Inauguration (Source: The 
White House) 


57 61 57 57 58 57 61 54 68 51 49 64 50 48 65 
52 56 46 54 49 51 47 55 55 54 42 51 56 55 51 
54 51 60 62 43 55 56 61 52 69 64 46 54 47 70 


lad} 36. Regnal Years 
Number of classes: 5 
Data set: Regnal years of the monarchs of Great Britain (Source: 
Britain Express) 


37 19 2 6 5 28 26 15 6 9 4 16 3 
34 2 1 1 19 5 2 14 1 21 13 «35 
19 1 35 10 17 55 35 20 50 22 14 9 
39 22 1 2 24 38 6 1 5 45 22 24 
9 1 25 3 13 12 8 


SECTION 2.1. Frequency Distributions and Their Graphs 75 


Constructing a Frequency Distribution and a Relative Frequency 
Histogram In Exercises 37-40, construct a frequency distribution and a 
relative frequency histogram for the data set using five classes. Which class has the 
greatest relative frequency and which has the least relative frequency? 


BG 37. Taste Test 
Data set: Ratings from 1 (lowest) to 10 (highest) provided by 
36 people after taste testing a new flavor of ice cream 


13 5 69 10 2 5 6 
6569 4 2 3 6 4 
659 8 7 45 7 1 
12567 13 69 


eB 38. Years of Driving 
Data set: Years of service of 28 best drivers in a city in France 


15 17 12 16 14 10 12 
14 13 16 11 18 14 #16 
18 19 12 9 23 19 17 
12 15 14 13 19 18 16 


BG 39. Polar Bears 


Data set: Weights (in kilograms) of 28 adult male polar bears 


450 420 489 456 417 413 499 
456 418 436 459 475 429 415 
436 425 469 468 412 436 491 
402 409 473 496 463 417 420 


7 40. Systolic Blood Pressures 
Data set: Systolic blood pressure levels (in millimeters of mercury) of 
28 patients 


125 130 170 184 95 110 111 
117. 126 141 #157 129 133 136 
128 142 108 106 113 128 133 
164 153 138 127 118 163 118 


Q 


Constructing a Cumulative Frequency Distribution and an Ogive 
In Exercises 41 and 42, construct a cumulative frequency distribution and an ogive 
for the data set using six classes. Then describe the location of the greatest increase 
in frequency. 


BG 41. Retirement Ages 
Data set: Retirement ages of 35 Statistics professors 


65 66 83 75 49 54 56 
59 54 58 57 65 64 69 


7 42. Calorie Intakes 
Data set: Daily calorie intakes (in kilojoules) of 28 people 


10500 =9800 = 9500) 9200 =11000 13500 11500 


14000 7800 9600 10800 10200 11000 11400 
12400 8600 9900 10900 10000 9900 10600 
14500 16900 14000 15500 9400 10800 10100 


76 


CHAPTER 2 _ Descriptive Statistics 


In Exercises 43 and 44, use the data set and the indicated number of classes to 
construct (a) an expanded frequency distribution, (b) a frequency histogram, 
(c) a frequency polygon, (d) a relative frequency histogram, and (e) an ogive. 


lad} 43. Road Accidents 
Data set: Number of accidents per day in a city 


144 4 18 30 9 10 11 2 12 8 16 6 
12 16 9 3 4 8 7 11 21 13 8 °7 


BG 44. Constellations 
Number of classes: 6 
Data set: Number of stars in the Chinese Hellenistic constellations 
(Source: Revolvy) 


4 43 3 43 9 
11 11 4 4 2 7 12 
10 322 5 5 6 

5 9 


Extending Concepts 


lad} 45. What Would You Do? You work at a bank and are asked to 
recommend the amount of cash to put in an ATM each day. You do 
not want to put in too much (which would cause security concerns) or 
too little (which may create customer irritation). Here are the daily 
withdrawals (in hundreds of dollars) for 30 days. 


72 84 61 76 104 76 86 92 80 88 98 76 97 82 84 
67 70 81 82 89 74 73 86 81 85 78 82 80 91 83 


(a) Construct a relative frequency histogram for the data. Use 8 classes. 


(b) If you put $9000 in the ATM each day, what percent of the days in 
a month should you expect to run out of cash? Explain. 


(c) If you are willing to run out of cash on 10% of the days, how much 
cash should you put in the ATM each day? Explain. 


BG 46. What Would You Do? The admissions department for a college is 
asked to recommend the minimum SAT scores that the college will 
accept for full-time students. Here are the SAT scores of 50 applicants. 


1170 1000 910 870 1070 1290 920 1470 1080 1180 
770 900 1120 1070 1370 1160 970 930 1240 1270 
1250 1330 1010 1010 1410 1130 1210 1240 960 820 
650 1010 1190 1500 1400 1270 1310 1050 950 1150 
1450 1290 1310 1100 1330 1410 840 1040 1090 1080 


(a) Construct a relative frequency histogram for the data. Use 10 classes. 


(b) If you set the minimum score at 1070, what percent of the applicants 
will meet this requirement? Explain. 


(c) If you want to accept the top 88% of the applicants, what should the 
minimum score be? Explain. 


7% 47. Writing Use the data set listed and technology to create frequency 
histograms with 5, 10, and 20 classes. Which graph displays the data 
best? Explain. 


2 7 3 2 11 #3 «15 8 4 
7 1110 1 #2 12 ~=5 6 4 


9 10 13 9 
2 9 15 14 


2.2 


What You Should Learn 


» How to graph and interpret 
quantitative data sets using 
stem-and-leaf plots and 
dot plots 


» How to graph and interpret 
qualitative data sets using 
pie charts and Pareto charts 

~ How to graph and interpret 
paired data sets using scatter 
plots and time series charts 


Number of Text Messages Sent 
76 49 102 58 88 


Study Tip 


It is important to 
include a key for a 
stem-and-leaf plot to identify 

the data entries. This is done by 
showing an entry represented by a 
stem and one leaf. 


SECTION 2.2 More Graphs and Displays 77 


Graphing Quantitative Data Sets m Graphing Qualitative Data Sets = 
Graphing Paired Data Sets 


Graphing Quantitative Data Sets 


In Section 2.1, you learned several ways to display quantitative data graphically. 
In this section, you will learn more ways to display quantitative data, beginning 
with stem-and-leaf plots. Stem-and-leaf plots are examples of exploratory data 
analysis (EDA), which was developed by John Tukey in 1977. 

In a stem-and-leaf plot, each number is separated into a stem (for instance, 
the entry’s leftmost digits) and a leaf (for instance, the rightmost digit). You 
should have as many leaves as there are entries in the original data set and the 
leaves should be single digits. A stem-and-leaf plot is similar to a histogram 
but has the advantage that the graph still contains the original data. Another 
advantage of a stem-and-leaf plot is that it provides an easy way to sort data. 


Constructing a Stem-and-Leaf Plot 


The data set at the left lists the numbers of text messages sent in one day by 
50 cell phone users. Display the data in a stem-and-leaf plot. Describe any 
patterns. (Adapted from Pew Research) 


SOLUTION 


Because the data entries go from a low of 16 to a high of 149, you should use 
stem values from 1 to 14. To construct the plot, list these stems to the left of a 
vertical line. For each data entry, list a leaf to the right of its stem. For instance, 
the entry 102 has a stem of 10 and a leaf of 2. Make the plot with the leaves in 
increasing order from left to right. Be sure to include a key. 


Number of Text Messages Sent 


1|69 Key: 10/2 = 102 
210346689999 
310023333489 
4/011389 
5/2368 

61679 
7126668 
8100689 

9} 9 

10 | 2 

11| 5 

12 | 2 

13 

14| 9 


Interpretation From the display, you can see that more than 50% of the cell 
phone users sent between 20 and 50 text messages. 


78 CHAPTER 2. Descriptive Statistics 


Tech Tip 


You can use technology 


such as Minitab, 
StatCrunch, or Excel 
(with the XLSTAT 
add-in) to construct a 
- stem-and-leaf plot. 
For instance, a StatCrunch 
stem-and-leaf plot for the data in 
Example 1 is shown below. 


Variable: Number of text 
messages sent 


Decimal point is 1 digit(s) 
to the right of the colon. 


Leaf unit = 1 
1:69 
2 : 0346689999 
3 : 0023333489 
4: 011389 
5 : 2368 
6: 679 
7 : 26668 
8 : 00689 
9:9 

103 2 

W128 

123 2 


* Study Tip 
You can use stem-and-leaf 
plots to identify unusual 
data entries called outliers. 
In Examples 1 and 2, the 
data entry 149 is an outlier. 
You will learn more about 
outliers in Section 2.3. 


TRY IT YOURSELF 1 


Use a stem-and-leaf plot to organize the points scored by the 51 winning teams 
listed on page 61. Describe any patterns. Answer: Page A32 


Constructing Variations of Stem-and-Leaf Plots 


Organize the data set in Example 1 using a stem-and-leaf plot that has two 
rows for each stem. Describe any patterns. 


SOLUTION 


Use the stem-and-leaf plot from Example 1, except now list each stem twice. 
Use the leaves 0, 1, 2, 3, and 4 in the first stem row and the leaves 5, 6, 7, 8, and 
9 in the second stem row. The revised stem-and-leaf plot is shown. Notice that 
by using two rows per stem, you obtain a more detailed picture of the data. 


Number of Text Messages Sent 


1 Key: 10|2 = 102 
1/69 
21034 
2/6689999 
3100233334 
3/89 
4/0113 
4189 

S23 

5|68 

6 

61679 

7/2 
716668 
8100 
8|689 

9 

9} 9 

10 | 2 

10 

u 

11] 5 

12 

12 

13 

13 

14 

14] 9 


Interpretation From the display, you can see that most of the cell phone 
users sent between 20 and 80 text messages. 
TRY IT YOURSELF 2 


Using two rows for each stem, revise the stem-and-leaf plot you constructed in 
Try It Yourself 1. Describe any patterns. Answer: Page A32 


SECTION 2.2 More Graphs and Displays 79 


You can also use a dot plot to graph quantitative data. In a dot plot, each 
data entry is plotted, using a point, above a horizontal axis. Like a stem-and-leaf 
plot, a dot plot allows you to see how data are distributed, to determine specific 
data entries, and to identify unusual data entries. 


Constructing a Dot Plot 
Use a dot plot to organize the data set in Example 1. Describe any patterns. 


Number of Text Messages Sent 
76 49 102 58 88 122 76 89 67 80 
66 80 78 69 56 76 115 99 72 19 
41 86 48 52 28 26 29 33 26 20 
33 24 43 16 39 29 32 29 29 40 
23 33 30 41 33 38 34 53 30 149 


SOLUTION 


So that each data entry is included in the dot plot, the horizontal axis should 
include numbers between 15 and 150. To represent a data entry, plot a point 
above the entry’s position on the axis. When an entry is repeated, plot another 
point above the previous point. 


Number of Text Messages Sent 


e 
e e 
e e e 

e 


<I 


T T T 
15 20 25 30 


e 
ee eee ee e e e 
+—+-—+—+- +++ + {+} $$ $$} $$} $$ $$ +} > 


T T T T T T T T T T T 
35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 140 145 150 
Interpretation From the dot plot, you can see that most entries occur 


between 20 and 80 and only 4 people sent more than 100 text messages. You 
can also see that 149 is an unusual data entry. 


TRY IT YOURSELF 3 


Use a dot plot to organize the points scored by the 51 winning teams listed on 
page 61. Describe any patterns. 
Answer: Page A32 


Technology can be used to construct dot plots. For instance, Minitab and 
StatCrunch dot plots for the text messaging data are shown below. 


Number of Text Messages Sent 


T T T T T T T T T T T T T T T T T T T T T T T T T T 
20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115120 125 130 185 140 145 150 


20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 140 145 150 


Number of Text Messages Sent 


80 


Master's 
19.8% 


Degrees Conferred in 2014 
Doctoral 
4.7% 


26.4% 


Bachelor's 
49.1% 


Associate's 


CHAPTER 2 _ Descriptive Statistics 


Graphing Qualitative Data Sets 


Pie charts provide a convenient way to present qualitative data graphically 
as percents of a whole. A pie chart is a circle that is divided into sectors that 
represent categories. The area of each sector is proportional to the frequency of 
each category. In most cases, you will be interpreting a pie chart or constructing 
one using technology. Example 4 shows how to construct a pie chart by hand. 


Constructing a Pie Chart Earned Degrees Conferred in 2014 

The numbers of earned degrees conferred Type of Ninuber 

(in thousands) in 2014 are shown in the degree (in thousands) 

table at the right. Use a pie chart to a 

organize the data. (Source: U.S. National Associate’s 1003 

Center for Education Statistics) Bachelor’s 1870 
Master’s 754 
Doctoral 178 


SOLUTION 

Begin by finding the relative frequency, or percent, of each category. Then 
construct the pie chart using the central angle that corresponds to each category. 
To find the central angle, multiply 360° by the category’s relative frequency. 
For instance, the central angle for associate’s degrees is 360°(0.264) ~ 95°. 


Earned Degrees Conferred in 2014 


Type of Relative 
degree if frequency Angle Doctoral 4.7% 
Associate’s | 1003 0.264 95° Associate’s 
Bachelor’s | 1870 0.491 177° 
Master’s 754 0.198 71° 
Doctoral 178 0.047 17° 


Bachelor’s 
49.1% 


Interpretation From the pie chart, you can see that almost one-half of the 
degrees conferred in 2014 were bachelor’s degrees. 
TRY IT YOURSELF 4 


The numbers of earned degrees conferred (in thousands) in 1990 are shown in 
the table. Use a pie chart to organize the data. Compare the 1990 data with the 
2014 data. (Source: U.S. National Center for Education Statistics) 


Earned Degrees Conferred in 1990 


Type of Number 
degree (in thousands) 
Associate’s 455 
Bachelor’s 1051 
Master’s 330 
Doctoral 104 Answer: Page A32 


You can use technology to construct a pie chart. For instance, an Excel pie 
chart for the degrees conferred in 2014 is shown at the left. 


RON 


(qx ) 


we Picturing 
the World 


According to data from the 

U.S. Bureau of Labor Statistics, 
earnings increase as educational 
attainment rises. The average 
weekly earnings data by 
educational attainment are shown 
in the Pareto chart. (Source: Based on 
U.S. Bureau of Labor Statistics) 


Average Weekly 
Earnings by 
Educational Attainment 


Average weekly earnings 
(in dollars) 


Educational attainment 


The average worker with an 
associate’s degree makes how 
much more in a year (52 weeks) 
than the average worker with a 
high school diploma? 


SECTION 2.2 More Graphs and Displays 81 


Another way to graph qualitative data is to use a Pareto chart. A Pareto 
chart is a vertical bar graph in which the height of each bar represents frequency 
or relative frequency. The bars are positioned in order of decreasing height, with 
the tallest bar positioned at the left. Such positioning helps highlight important 
data and is used frequently in business. 


Constructing a Pareto Chart 
In 2014, these were the leading causes of death in the United States. 
Accidents: 136,053 
Cancer: 591,699 
Chronic lower respiratory disease: 147,101 
Heart disease: 614,348 
Stroke (cerebrovascular diseases): 133,103 


Use a Pareto chart to organize the data. What was the leading cause of death 
in the United States in 2014? (Source: Health, United States, 2015, Table 19) 


SOLUTION 


Using frequencies for the vertical axis, you can construct the Pareto chart as 
shown. 


Top Five Causes of Death in the United States 


Deaths (in thousands) 
Ww 
wn 
oO 
i 
T 


Heart disease Cancer Chronic lower Accidents Stroke 
respiratory disease (cerebrovascular 
diseases) 
Cause 


Interpretation From the Pareto chart, you can see that the leading cause of 
death in the United States in 2014 was from heart disease. Also, heart disease 
and cancer caused more deaths than the other three causes combined. 
TRY IT YOURSELF 5 
Every year, the Better Business Bureau (BBB) receives complaints from 
customers. Here are some complaints the BBB received in a recent year. 
16,281 complaints about auto dealers (used cars) 
8384 complaints about insurance companies 
3634 complaints about mortgage brokers 
19,277 complaints about collection agencies 
6985 complaints about travel agencies and bureaus 


Use a Pareto chart to organize the data. Which industry is the greatest cause 
of complaints? (Source: Council of Better Business Bureaus) 
Answer: Page A32 


82 


CHAPTER 2 _ Descriptive Statistics 


Graphing Paired Data Sets 


When each entry in one data set corresponds to one entry in a second data set, 
the sets are called paired data sets. For instance, a data set contains the costs of 
an item and a second data set contains sales amounts for the item at each cost. 
Because each cost corresponds to a sales amount, the data sets are paired. One 
way to graph paired data sets is to use a scatter plot, where the ordered pairs 
are graphed as points in a coordinate plane. A scatter plot is used to show the 
relationship between two quantitative variables. 


Interpreting a Scatter Plot 


The British statistician Ronald Fisher (see page 57) introduced a famous 
data set called Fisher’s Iris data set. This data set describes various physical 
characteristics, such as petal length and petal width (in millimeters), for three 
species of iris. In the scatter plot shown, the petal lengths form the first data 
set and the petal widths form the second data set. As the petal length increases, 
what tends to happen to the petal width? (Source: Fisher, R. A., 1936) 


Fisher’s Iris Data Set 
25 Ag ee 
e eee e@e00 e 
oO ee 
a ecco © e 
oO 20 ecco 
€ CxO 
=| ee @eeee @ 
Ta e e 
B45 o_se0-cco_° 
& e be — aac 
Ss . or e 
3 10 eee e e 
= 
3 : ° 
oO a e 
re coco. 
@ eeccee © 
ee }—> 
10 20 30 40 50 60 70 


Petal length (in millimeters) 


SOLUTION 


The horizontal axis represents the petal length, and the vertical axis represents 
the petal width. Each point in the scatter plot represents the petal length and 
petal width of one flower. 

Interpretation From the scatter plot, you can see that as the petal length 
increases, the petal width also tends to increase. 


TRY IT YOURSELF 6 


The lengths of employment and the salaries of 10 employees are listed in the 
table below. Graph the data using a scatter plot. Describe any trends. 


Length of employment (in years) 5 4 8 4 2 
Salary (in dollars) 32,000 32,500 | 40,000 | 27,350 | 25,000 

Length of employment (in years) 10 7 6 9 3 
Salary (in dollars) 43,000 | 41,650 | 39,225 | 45,100 28,000 


Answer: Page A33 


You will learn more about scatter plots and how to analyze them in 
Chapter 9. 


SECTION 2.2 More Graphs and Displays 83 


A data set that is composed of quantitative entries taken at regular intervals 
over a period of time is called a time series. For instance, the amount of 
precipitation measured each day for one month is a time series. You can use a 
time series chart to graph a time series. 


See Minitab and TI-84 Plus 
steps on pages 146 and 147. 


Constructing a Time Series Chart 


The table lists the number of motor vehicle thefts (in millions) and burglaries 
(in millions) in the United States for the years 2005 through 2015. Construct 
a time series chart for the number of motor vehicle thefts. Describe any 
trends. (Source: Federal Bureau of Investigation, Crime in the United States) 


Motor vehicle thefts | Burglaries 


Year (in millions) (in millions) 
2005 1.24 2.16 
2006 1.20 2.19 
2007 1.10 2.19 
2008 0.96 2.23 
2009 0.80 2.20 
2010 0.74 2.17 
2011 0.72 2.19 
2012 0.72 2.11 
2013 0.70 1.93 
2014 0.69 1.71 
2015 0.71 1.58 


SOLUTION 


Let the horizontal axis represent the years and let the vertical axis represent 
the number of motor vehicle thefts (in millions). Then plot the paired data and 
connect them with line segments 


Motor Vehicle Thefts 


Thefts (in millions) 


2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 


> 


Year 


Interpretation The time series chart shows that the number of motor vehicle 
thefts decreased until 2011 and then remained about the same through 2015. 


TRY IT YOURSELF 7 


Use the table in Example 7 to construct a time series chart for the number of 
burglaries for the years 2005 through 2015. Describe any trends. 
Answer: Page A33 


84 CHAPTER 2. Descriptive Statistics 


2.2 EXERCISES For Extra Help: MyLab Statistics 


Building Basic Skills and Vocabulary 


1. Name some ways to display quantitative data graphically. Name some ways 
to display qualitative data graphically. 


2. What is an advantage of using a stem-and-leaf plot instead of a histogram? 
What is a disadvantage? 


3. In terms of displaying data, how is a stem-and-leaf plot similar to a dot plot? 


4. How is a Pareto chart different from a standard vertical bar graph? 


Putting Graphs in Context Jn Exercises 5-8, match the plot with the 
description of the sample. 


5. 0/8 Key: 0|8 = 0.8 6. 6/78 Key: 6|7 = 67 
1}568 71455888 
2/1345 8/1355889 
3|09 9;00024 
4100 

7 * 8 e 
eS ARLAAAALSALAARAASALLAAAALEUALALION ae ee 
Be OES Ah ae Re MEE > 

200 205 210 215 220 


(a) Times (in minutes) it takes a sample of employees to drive to work 
(b) Grade point averages of a sample of students with finance majors 


(c) Top speeds (in miles per hour) of a sample of high-performance sports 
cars 


(d) Ages (in years) of a sample of residents of a retirement home 


Graphical Analysis Jn Exercises 9-12, use the stem-and-leaf plot or dot 
plot to list the actual data entries. What is the maximum data entry? What is the 
minimum data entry? 


O29 Key: 2|7 = 27 = 10. 12 Key: 12|9 = 12.9 
3) 2 12 | 9 
4}1334778 13 | 3 
5/0112333444456689 13 |}677 
6/888 14);1111344 
7|388 14/699 
8] 5 15;000124 
15 |}678889 
16 | 1 
16 | 67 
11. : 12. 7 
aaa HEHEHE Ht 
B 4 1 16 17 18 19 215-220-225 230.—-235 


Drunk Driving Cases 


30 


32 


28 


36 


25 


TABLE FOR EXERCISE 20 


SECTION 2.2 More Graphs and Displays 


Using and Interpreting Concepts 


Graphical Analysis Jn Exercises 13-16, give three observations that can be 


made from the graph. 


13. 


15. 


Graphing Data Sets Jn Exercises 17-32, organize the data using the indicated 


Monthly Active Users on 14, Motor Vehicle Thefts at U.S. 
5 Social Networking Sites Universities and Colleges 
A 
as of September 2016 iene 
A 
1750 -- 1850 + 
a 
3 > 1500 >- 1800 + 
BE 1250-- e 
© = 10007 & 1750-- 
oS r 
28 7507 = 1700+ 
EE soot 
Z 250 + | | [| 1650 —- 
epee! i ae — 1600 ++ 
SF = & < 
& & KS SS 
a“ S x 2010 2011 2012 2013 2014 2015 
Site Year 


(Source: Statista) 


Least Popular American Drivers 16. Amount Spent on Pet Care 
The Multitasker 8% The Swerver 8% 5s } 
The Crawler 8% The Speeder 5% ' 
The Left-Lane \ The Drifter 


Hog 11% 


The Last-Minute 


(Source: Federal Bureau of Investigation) 


ne 9 
Ra Fy 
xg Bs 
+ Of 
S x 
Amount spent (in billions) 
eo 
wn Co Nn 
a 
i 
7 
| 
Y 


Line-Cutter The Tailgater 14% 


Qs : J 
13% © & SS SE ES 
SF SELES LF 
: S SF OM SS 
(Source: Expedia) es 
Ae 
Type of care 


(Source: American Pet Products 
Association) 


type of graph. Describe any patterns. 


BG 17. Humidity Use a stem-and-leaf plot to display the data, which 
represent the humidity (in percentages) in the atmosphere as measured 


as} 18. Studying Use a stem-and-leaf plot to display the data, which represent 


eB 19. Runs scored Use a stem-and-leaf plot to display the data, which 


BG 20. 


in 20 different days in a city. 


20.8 20.5 21.0 21.3 186 208 196 194 19.2 21.5 
22.6 21.8 22.55 22.8 201 21.6 214 208 19.6 19.9 


the numbers of hours 24 students study per week. 


20 24 25 18 15 16 23 11 18 29 35 42 
16 18 23 19 28 26 24 21 32 25 30 17 


represent the runs scored by a batsman in a World Cup series. 


70 75 71 73 78 70 90 91 94 85 87 99 
75 88 86 82 78 79 81 77 90 73 91 98 


at 30 strategic road intersections. 


Drunk Driving Use a stem-and-leaf plot to display the data shown in 
the table at the left, which represent the drunk driving cases registered 


86 


Hours 


28 
3) 
33 
29 
34 
38 
43 
eil 
63 
28 
39 
48 


TABLE FOR EXERCISE 29 


CHAPTER 2 _ Descriptive Statistics 


Hourly Fee 
10.83 
16.25 
11.13 
12.15 
16.41 
15.33 
14.98 
16.15 
15.13 
12.18 
13.19 
14.14 


Q 


QB 


Q 


25. 


26. 


27. 


28. 


ze 


21. Highest-Paid Tech CEOs _ Use a stem-and-leaf plot that has two rows 
for each stem to display the data, which represent the incomes (in 
millions) of the top 30 highest-paid tech CEOs. (Source: Business Insider) 


41 17 33 25 28 28 32 20 16 22 19 15 19 
14 13 25 14 41 20 19 20 33 25 20 22 18 
19 15 13 14 


22. Salaries The salaries (in thousand dollars) of a sample of 10 employees 
225 410 368 310 228 298 361 159 486 296 


23. Blood Glucose Levels Use a dot plot to display the data, which 
represent the blood glucose levels (in milligrams per deciliter) of 
24 patients at a pathology laboratory. 


68 73 75 82 91 96 99 65 71 62 83 87 
101 94 82 96 100 78 73 78 94 63 85 97 


24. Weights of Adult Polar Bears Use a dot plot to display the data, 
which represent the weights (in kilograms) of 20 polar bears. 


426 428 436 545 510 386 480 485 486 399 
525 S501 425 442 369 510 525 408 399 403 


Student Loans Use a pie chart to display the data, which represent the 
numbers of student loan borrowers (in millions) by balance owed in the 
fourth quarter of 2015. (Source: Federal Reserve Bank of New York) 


$1 to $10,000 16.7 $10,001 to $25,000 12.4 
$25,001 to $50,000 = 8.3 $50,001+ 6.7 


New York City Marathon Use a pie chart to display the data, which 
represent the number of men’s New York City Marathon winners from each 
country through 2016. (Source: New York Road Runners) 


United States 15 Tanzania 1 Great Britain 1 
Italy 4 Kenya 12 Brazil 2 
Ethiopia 2 Mexico 4 New Zealand 1 
South Africa 2 Morocco 1 Eritrea 1 


FIFA World Cup The five countries that have won the FIFA World cup 
more than once include Uruguay (2), Italy (4), Germany (4), Brazil (5), and 
Argentina (2). Use a Pareto chart to display the data. (Source: FIFA) 


Vehicle Costs The average owning and operating costs for four types of 
vehicles in the United States in 2016 include small sedans ($6579), medium 
sedans ($8604), SUVs ($10,255), and minivans ($9262). Use a Pareto chart 
to display the data. (Source: American Automobile Association) 


29. Hourly Fees Use a scatter plot to display the data shown in the table 
at the left. The data represent the numbers of coaching hours and the 
hourly fees (in dollars) of 12 cricket coaches. 


Hours 


28 
35 
33 
29 
34 
38 
43 
51 
63 
28 
39 
48 


TABLE FOR EXERCISE 30 


Hourly Fee 


10.83 
16.25 
11.13 
12.15 
16.41 
15.33 
14.98 
16.15 
15.13 
12.18 
13.19 
14.14 


lad} 30. 


Q 


Q 


31. 


32. 


SECTION 2.2 More Graphs and Displays 87 


Hourly Fees _ Use a scatter plot to display the data shown in the table 
at the left. The data represent the numbers of coaching hours and the 
hourly fees (in dollars) of 12 cricket coaches. 


Engineering Degrees Use a time series chart to display the data 
shown in the table. The data represent the numbers of bachelor’s 
degrees in engineering (in thousands) conferred in the U.S. (Source: 
American Society for Engineering Education) 


Year 2008 2009 2010 | 2011 | 2012 2013 2014 = 2015 
Degrees 74.2 744 783 83.0 | 88.2 934 99.2 | 106.7 
Tourism Use a time series chart to display the data shown in the table. 


The data represent the percentages of Egypt’s gross domestic product 
(GDP) that come from the travel and tourism sector. (Source: Knoema) 


Year 2005 2006 


Percent 19.1% | 19.0% 


Year 2011 2012 
Percent 12.8% | 12.1% 


33. Basketball Display the data 
below in a stem-and-leaf plot. 
Describe the differences in how 
the dot plot and the stem-and-leaf 
plot show patterns in the data. 

Heights of Players 
on a College Basketball Team 


Inches 


35. Favorite Season Display the 
data below in a Pareto chart. 
Describe the differences in how 
the pie chart and the Pareto 
chart show patterns in the data. 
(Source: Ipsos Public Affairs) 


Favorite Season of 
US. Adults Ages 18 and Older 


Winter 8% 


19.5% 


34. 


36. 


2008 2009 2010 
19.0% | 17.1% | 16.7% 


2014 2015 2016 
9.2% 8.6% 7.2% 


Phone Screen Sizes Display 
the data below in a dot plot. 
Describe the differences in how 
the stem-and-leaf plot and the dot 
plot show patterns in the data. 


Phone Screen Sizes (in inches) 
00 Key: 5|0 = 5.0 
55567889 
00012344 
556889 

0 


NNDDUN 


Favorite Day of the Week 
Display the data below in a pie 
chart. Describe the differences in 
how the Pareto chart and the pie 
chart show patterns in the data. 


Favorite Day of the Week 
A 


1 —— 
12-- 
9-4 
6-4 
3-4 


Number of people 


88 CHAPTER 2 _ Descriptive Statistics 


Law Firm A Law Firm B 
50} 9|03 
85222/10/57 
99700/11/005 
11/12/0335 
13}2259 
14/;13339 
15|}555 6 
16);499 
99510/17/125 
55521/]18]9 
99875)/19}0 
3 | 20 


Key: 5|19|0 = $195,000 for 


Law Firm A and $190,000 


for Law Firm B 


FIGURE FOR EXERCISE 41 


3:00 p.m. Class 
40 60 73 77 19 18 20 


TABLE FOR EXERCISE 42 


8:00 p.m. Class 


29 


Extending Concepts 


A Misleading Graph? A misleading graph is not drawn appropriately, which 
can misrepresent data and lead to false conclusions. In Exercises 37-40, (a) explain 
why the graph is misleading, and (b) redraw the graph so that it is not misleading. 


37. Sales for Company A 


Sales (in thousands 
of dollars) 
S25 
t t 


Quarter 


39. Sales for Company B 


4th quarter 
20% 


3rd quarter 
38% 4% 


100 =- 
Ll. 
3rd 2nd 1st 4th 


1st quarter 
38% 


2nd quarter 


38. 


Percent that 


Barrels (in millions) 


responded “yes” 


Results of a Survey 


68 =- 
64+ 
60 -- 
st 
> 


Middle High —College/ 
school school university 


Type of student 


U.S. Crude Oil Imports by 
Country of Origin, 
January—August 2016 


1500 +- 


1000 >- 
> 


OPEC non-OPEC 
countries countries 


(Source: U.S. Energy Information 
Administration) 


41. Law Firm Salaries A back-to-back stem-and-leaf plot compares two data 
sets by using the same stems for each data set. Leaves for the first data set 
are on one side while leaves for the second data set are on the other side. The 
back-to-back stem-and-leaf plot at the left shows the salaries (in thousands 
of dollars) of all lawyers at two small law firms. 


(a) What are the lowest and highest salaries at Law Firm A? at Law Firm B? 


How many lawyers are in each firm? 


(b) Compare the distribution of salaries at each law firm. What do you 


notice? 


eB 42. Yoga Classes The data sets at the left show the ages of all participants 


in two yoga classes. 


(a) Make a back-to-back stem-and-leaf plot as described in Exercise 41 


to display the data. 


(b) What are the lowest and highest ages of participants in the 3:00 p.m. 
class? in the 8:00 p.m. class? How many participants are in each class? 


(c) Compare the distribution of ages in each class. What observation(s) 


can you make? 


eB 43. Choosing an Appropriate Display Use technology to create 
(a) a stem-and-leaf plot, (b) a dot plot, (c) a pie chart, (d) a frequency 
histogram, and (e) an ogive for the data. Which graph displays the data 


best? Explain. 


64 46 40 55 70 31 47 44 55 63 
49 49 26 72 64 55 44 71 45 72 


2.3 


What You Should Learn 


» How to find the mean, median, 
and mode of a population and 
of a sample 


T 


How to find a weighted mean of 
a data set, and how to estimate 
the sample mean of grouped 
data 


» How to describe the shape of 
a distribution as symmetric, 
uniform, or skewed, and how to 
compare the mean and median 
for each 


Study Tip 


Notice that the mean in 
Example 1 has one more 
decimal place than the 
original set of data entries. 
When the mean needs to 
be rounded, this round-off 
rule will be used in the text. Another 
important rouna-off rule is that 
rounding should not be done until 
the last calculation. 


SECTION 2.3 Measures of Central Tendency 89 


Mean, Median, and Mode m Weighted Mean and Mean of Grouped Data 
= The Shapes of Distributions 


Mean, Median, and Mode 


In Sections 2.1 and 2.2, you learned about the graphical representations of 
quantitative data. In Sections 2.3 and 2.4, you will learn how to supplement 
graphical representations with numerical statistics that describe the center and 
variability of a data set. 

A measure of central tendency is a value that represents a typical, or central, 
entry of a data set. The three most commonly used measures of central tendency 
are the mean, the median, and the mode. 


DEFINITION 


The mean of a data set is the sum of the data entries divided by the number 
of entries. To find the mean of a data set, use one of these formulas. 


Population Mean: w = N 


Sample Mean: X¥ = 22 


The lowercase Greek letter 4. (pronounced mu) represents the population 
mean and X (read as “x bar”) represents the sample mean. Note that N 
represents the number of entries in a population and n represents the number 
of entries in a sample. Recall that the uppercase Greek letter sigma (>) 
indicates a summation of values. 


Finding a Sample Mean 


The weights (in pounds) for a sample of adults before starting a weight-loss 
study are listed. What is the mean weight of the adults? 


274 235 223 268 290 285 235 


SOLUTION The sum of the weights is 
dx = 274 + 235 + 223 + 268 + 290 + 285 + 235 = 1810. 


There are 7 adults in the sample, so n = 7. To find the mean weight, divide the 
sum of the weights by the number of adults in the sample. 


Round the last calculation to one more 
x= =x = 1810 = 258.6. decimal place than the original data. 
n 


So, the mean weight of the adults is about 258.6 pounds. 


TRY IT YOURSELF 1 


Find the mean of the points scored by the 51 winning teams listed on page 61. 
Answer: Page A33 


90 CHAPTER 2 _ Descriptive Statistics 


Tech Tip 


You can use technology 
such as Minitab, Excel, 
StatCrunch, or the 
TI-84 Plus to find the 
mean and median of 
a data set. For instance, 
o find the mean and median of 
he weights listed in Example 1 on 
a TI-84 Plus, enter the data in L1. 
Next, press [2nd] LIST and from 
he MATH menu choose mean. 
Then press LIST and from the 
MATH menu choose median. 


TI-84 PLUS 


mesantLiaa 
Zod, or 14256 
mediantla ae 


DEFINITION 


The median of a data set is the value that lies in the middle of the data when 
the data set is ordered. The median measures the center of an ordered data 
set by dividing it into two equal parts. When the data set has an odd number 


of entries, the median is the middle data entry. When the data set has an even 
number of entries, the median is the mean of the two middle data entries. 


Finding the Median 

Find the median of the weights listed in Example 1. 

SOLUTION To find the median weight, first order the data. 
223 235 235 268 274 285 290 


Because there are seven entries (an odd number), the median is the middle, or 
fourth, entry. So, the median weight is 268 pounds. 


TRY IT YOURSELF 2 


Find the median of the points scored by the 51 winning teams listed on page 61. 
Answer: Page A33 


In a data set, the number of data entries above the median is the same as the 


number below the median. For instance, in Example 2, three of the weights are 
below 268 pounds and three are above 268 pounds. 


Finding the Median 
In Example 2, the adult weighing 285 pounds decides to not participate in the 
study. What is the median weight of the remaining adults? 


SOLUTION The remaining weights, in order, are 
223 235 235 268 274 290. 


Because there are six entries (an even number), the median is the mean of the 
two middle entries. 
235 + 268 
Median = — = 251.5 


So, the median weight of the remaining adults is 251.5 pounds. You can check 
your answer using technology, as shown below using Excel. 


A B Cc 
1 | MEDIAN(223,235,235,268,274,290) 
2 251.5 


TRY IT YOURSELF 3 


The points scored by the winning teams in the Super Bowls for the National 
Football League’s 2001 through 2016 seasons are listed. Find the median. 


20 48 32 24 21 29 17 27 
31 31 21 34 43 28 24 34 Answer: Page A33 


SECTION 2.3 Measures of Central Tendency 91 


DEFINITION 


The mode of a data set is the data entry that occurs with the greatest 
frequency. A data set can have one mode, more than one mode, or no mode. 
When no entry is repeated, the data set has no mode. When two entries occur 


with the same greatest frequency, each entry is a mode and the data set is 
called bimodal. 


Finding the Mode 
Find the mode of the weights listed in Example 1. 


SOLUTION 
To find the mode, first order the data. 
223, 235 235 268 274 285 290 


From the ordered data, you can see that the entry 235 occurs twice, whereas the 
other data entries occur only once. So, the mode of the weights is 235 pounds. 


TRY IT YOURSELF 4 


Find the mode of the points scored by the 51 winning teams listed on page 61. 
Answer: Page A33 


Finding the Mode 


At a political debate, a sample of audience members were asked to name the 
political party to which they belonged. Their responses are shown in the table. 
What is the mode of the responses? 


Political party Frequency, f 


Democrat 46 
Republican 34 
Independent 39 
Other/don’t know 5 


SOLUTION 


The response occurring with the greatest frequency is Democrat. So, the mode 
is Democrat. 


Interpretation In this sample, there were more Democrats than people of 
any Other single affiliation. 


TRY IT YOURSELF 5 


In a survey, 1534 adults were asked, “How much do you, personally, care about 
the issue of global climate change?” Of those surveyed, 550 said “a great deal,” 
578 said “some,” 274 said “not too much,” 119 said “not at all,” and 13 did 
not provide an answer. What is the mode of the responses? (Adapted from Pew 
Research Center) Answer: Page A33 


The mode is the only measure of central tendency that can be used to 
describe data at the nominal level of measurement. But when working with 
quantitative data, the mode is rarely used. 


92 CHAPTER 2_ Descriptive Statistics 


Ages in a class 


20 20 20 20 20 20 21 
21 21 21 22 22 22 23 


23 23 23 24 24 65 


ER 


R3 


se) Picturing 


the World 


The National Association of 


Realtors keeps track of existing- 
home sales. One list uses the 
median price of existing homes 
sold and another uses the mean 
price of existing homes sold. 

The sales for the third quarter of 
2016 are shown in the double-bar 
graph. (Source: National Association of 


Realtors) 


Existing-Home Sales 


A 
350 +- 


300 -- 
250 5 
200 5 
150 5 


100 5 


Existing-home price 
(in thousands of dollars) 


505 


Notice in the graph that each 
month, the mean price is about 
$42,000 more than the median 
price. Identify a factor that would 
cause the mean price to be greater 
than the median price. 


2016 U.S. 


© Median price 
@ Mean price 


July Aug. Sept. 
Month 


Although the mean, the median, and the mode each describe a typical entry 
of a data set, there are advantages and disadvantages of using each. The mean 
is a reliable measure because it takes into account every entry of a data set. The 
mean can be greatly affected, however, when the data set contains outliers. 


DEFINITION 


An outlier is a data entry that is far removed from the other entries in the data 


set. (You will learn a formal way for determining an outlier in Section 2.5.) 


While some outliers are valid data, other outliers may occur due to 
data-recording errors. A data set can have one or more outliers, causing gaps in 
a distribution. Conclusions that are drawn from a data set that contains outliers 
may be flawed. 


Comparing the Mean, the Median, and the Mode 


The table at the left shows the sample ages of students in a class. Find the 
mean, median, and mode of the ages. Are there any outliers? Which measure 
of central tendency best describes a typical entry of this data set? 


SOLUTION 


From the histogram below, it appears that the data entry 65 is an outlier 
because it is far removed from the other ages in the class. 


475 
Mean: X= 2x = 0 =~ 23.8 years 
21 + 22 
Median: Median = a 21.5 years 


Mode: The entry occurring with the greatest frequency is 20 years. 


Interpretation The mean takes every entry into account but is influenced 
by the outlier of 65. The median also takes every entry into account, and it is 
not affected by the outlier. In this case the mode exists, but it does not appear 
to represent a typical entry. Sometimes a graphical comparison can help you 
decide which measure of central tendency best represents a data set. The 
histogram shows the distribution of the data and the locations of the mean, the 
median, and the mode. In this case, it appears that the median best describes 
the data set. 


Ages of Students in a Class 


> 


6+ 
a oT 
2 
a 47 
= 
al 
Boot 
1 = 
FZ 40 45 50 35 60 65 
Mean Age __ 4 
inet Median Outlier 


TRY IT YOURSELF 6 


Remove the data entry 65 from the data set in Example 6. Then rework the 
example. How does the absence of this outlier change each of the measures? 
Answer: Page A33 


Tech Tip 


You can use technology 
such as Minitab, Excel, 
StatCrunch, or the 
TI-84 Plus to find the 
weighted mean. For 
— instance, to find the 
weighted mean in Example 7 on a 
TI-84 Plus, enter the points in L1 
and the credit hours in L2. Then, 
use the 1-Var Stats feature with L1 
as the list and L2 as the frequency 
list to calculate the mean (and other 
statistics), as shown below. 


TI-84 PLUS 


X=2. 5) <——_ Mean 
>x=40 

Sea] 12 
Sx=.894427191 
ox=.86602540388 
Vn=16 


SECTION 2.3 Measures of Central Tendency 93 


Weighted Mean and Mean of Grouped Data 


Sometimes data sets contain entries that have a greater effect on the mean 
than do other entries. To find the mean of such a data set, you must find the 
weighted mean. 


DEFINITION 


A weighted mean is the mean of a data set whose entries have varying 
weights. The weighted mean is given by 


>Sxw Sum of the products of the entries and the weights 


x= 


sw Sum of the weights 


where w is the weight of each entry x. 


Finding a Weighted Mean 


Your grades from last semester are in the table. The grading system assigns 
points as follows: A = 4,B = 3,C = 2,D = 1, F = 0. Determine your grade 
point average (weighted mean). 


Final Grade Credit Hours 


ParUAA 
WN WR BR WwW 


SOLUTION 


Let x be the points assigned to the letter grade and w be the credit hours. You 
can organize the points and hours in a table. 


Points,x Credit hours, w xw 
2 3 6 
2 4 
1 1 
4 3 12 
2 2 4 
3 3 9 

Xw = 16 X(x:w) = 40 
pa 8 53 
xw = 16 


Last semester, your grade point average was 2.5. 


TRY IT YOURSELF 7 


In Example 7, your grade in the two-credit course is changed to a B. What is 
your new weighted mean? 


Answer: Page A33 


94 CHAPTER 2. Descriptive Statistics 


For data presented in a frequency distribution, you can estimate the mean as 
shown in the next definition. 


DEFINITION 


Study Tip 
For a frequency 
distribution that 
represents a population, 
the mean of the 
frequency distribution is 
estimated by 


_ =x 
BN 


The mean of a frequency distribution for a sample is estimated by 


xf 


n 


x= Note thatn = Xf. 


where x and fare the midpoint and frequency of each class, respectively. 


GUIDELINES 
where N = Sf. 


Finding the Mean of a Frequency Distribution 
In Words In Symbols 
(Lower limit) + (Upper limit) 
2 


. Find the midpoint of each class. x= 


. Find the sum of the products dxf 
of the midpoints and the 
frequencies. 


. Find the sum of the frequencies. n= Xf 


. Find the mean of the x= — 
frequency distribution. 


EXAMPLE 8 


Finding the Mean of a Frequency Distribution 


rere) Frequency, The frequency distribution at the left shows the out-of-pocket prescription 
x Sf af medicine expenses (in dollars) for 30 U.S. adults in a recent year. Use the 
172.5 3 5175 frequency distribution to estimate the mean expense. Using the sample mean 
508.5 5 417.0 formula from page 89 with the original data set (see Example 1 in Section 2.1), 
the mean expense is $285.50. Compare this with the estimated mean. 
244.5 5 1222.5 
280.5 6 1683.0 SOLUTION 
316.5 7 2215.5 _ Saf 
352.5 4 1410.0 en 
388.5 3 1165.5 _ 8631 
n=30 > = 8631 300 
= 287.7 


Interpretation The mean expense is $287.70. This value is an estimate 
because it is based on class midpoints instead of the original data set. Although 
it is not substantially different, the mean of $285.50 found using the original 
data set is a more accurate result. 


TRY IT YOURSELF 8 


Use a frequency distribution to estimate the mean of the points scored by the 
51 winning teams listed on page 61. (See Try It Yourself 2 on page 65.) Using 
the population mean formula from page 89 with the original data set, the mean 
is about 30.2 points. Compare this with the estimated mean. 

Answer: Page A33 


SECTION 2.3 Measures of Central Tendency 95 


The Shapes of Distributions 


A graph reveals several characteristics of a frequency distribution. One such 


& characteristic is the shape of the distribution. 
i : DEFINITION 

'y Study Tip 
The graph of a symmetric A frequency distribution is symmetric when a vertical line can be drawn 
distribution is not always through the middle of a graph of the distribution and the resulting halves are 
bell-shaped (see below). approximately mirror images. 
Some of the other possible A frequency distribution is uniform (or rectangular) when all entries, or 
shapes for the graph of a classes, in the distribution have equal or approximately equal frequencies. 


symmetric distribution are A uniform distribution is also symmetric. 


A ge ale A frequency distribution is skewed when the “tail” of the graph elongates 


more to one side than to the other. A distribution is skewed left (negatively 
skewed) when its tail extends to the left. A distribution is skewed right 
(positively skewed) when its tail extends to the right. 


To explore this topic further, When a distribution is symmetric and unimodal, the mean, median, and mode 

see Activity 2.3 on page 103. are equal. When a distribution is skewed left, the mean is less than the median and 
the median is usually less than the mode. When a distribution is skewed right, the 
mean is greater than the median and the median is usually greater than the mode. 
Examples of these commonly occurring distributions are shown. 


A A 
45+ 45> 


40+ 40-- 
35-+ 35-4 
30 TT 30 = 
25+ 25+ 
20-+ 20+ 
15+ 154 
10 i 10 + 

5 ia | 5 + 


1 3 3 7 9 Tt “13-15 


1 3 5 ei 9 11 13 #15 


+ Mean Mean 
+ Median Median 
Be aware that there are Lwiode 
aed aneton siapesa Symmetric Distribution Uniform Distribution 
distributions. In some 
cases, the shape cannot A A 
45+ 45+ 


be classified as symmetric, 
® uniform, or skewed. A 
distribution can have several gaps 
caused by outliers or clusters of data. 
Clusters may occur when several 
types of data entries are used in a 
data set. For instance, a data set of 
gas mileages for trucks (which get 
low gas mileage) and hybrid cars 
(which get high gas mileage) would 
have two clusters. 


40+ 
35+ 
30-F 
25+ 
20+ 
154 
10- 

5 


1 3 5 15 


7 9 11 13 
Mean Mode 


1 3 5 7 9 
Mode Mean 


Median 
Skewed Left Distribution Skewed Right Distribution 


11 


13.0 15 


Median 


The mean will always fall in the direction in which the distribution is 
skewed. For instance, when a distribution is skewed left, the mean is to the left 
of the median. 


96 CHAPTER 2. Descriptive Statistics 


2.3 EXERCISES 


For Extra Help: MyLab Statistics 


Building Basic Skills and Vocabulary 


True or False? Jn Exercises 1-4, determine whether the statement is true or 
false. If it is false, rewrite it as a true statement. 


1. The mean is the measure of central tendency most likely to be affected by an 
outlier. 


2. Some quantitative data sets do not have medians. 
3. A data set can have the same mean, median, and mode. 


4. When each data class has the same frequency, the distribution is symmetric. 


Constructing Data Sets Jn Exercises 5—8, construct the described data set. 
The entries in the data set cannot all be the same. 


5. Median and mode are the same. 
6. Mean and mode are the same. 
7. Mean is not representative of a typical number in the data set. 


8. Mean, median, and mode are the same. 


Graphical Analysis In Exercises 9-12, determine whether the approximate 
shape of the distribution in the histogram is symmetric, uniform, skewed left, 
skewed right, or none of these. Justify your answer. 


9. 454 10. 


225 


25,000 45,000 65,000 85,000 85 95 105 115 125 135 145 155 


ll. + 12, 4 


12345678910n12 
Matching Jn Exercises 13-16, match the distribution with one of the graphs in 
Exercises 9-12. Justify your decision. 

13. The frequency distribution of 180 rolls of a dodecagon (a 12-sided die) 


14. The frequency distribution of mileages of service vehicles at a business where 
a few vehicles have much higher mileages than the majority of vehicles 


15. The frequency distribution of scores on a 90-point test where a few students 
scored much lower than the majority of students 


16. The frequency distribution of weights for a sample of seventh-grade boys 


SECTION 2.3 Measures of Central Tendency 97 


Using and Interpreting Concepts 


Finding and Discussing the Mean, Median, and Mode in 
Exercises 17-34, find the mean, the median, and the mode of the data, if possible. 
If any measure cannot be found or does not represent the center of the data, 
explain why. 


17. Subject Scores The English scores for a sample of 14 students 


26 24 26 24 29 25 28 
27 21 23 26 23 21 24 


18. Salaries The salaries (in thousand dollars) of a sample of 10 employees 
225 410 368 310 228 298 361 159 486 296 


19. Idle Times The durations (in minutes) of idle times at a factory in the last 
10 months 


18 32 28 34 46 62 38 
22 64 42 22 34 68 12 
54 28 18 60 28 50 28 


20. Leaders The ages (in completed years) of the youngest leaders at the time 
of assuming office (Source: NinjaJournalist) 


33 37 37 39 39 41 43 43 44 45 


21. Tuition The 2016-2017 tuition and fees (in thousands of dollars) for the top 
14 universities in the U.S. (Source: U.S. News & World Report) 


45 47 52 49 55 48 48 
51 51 50 S51 48 51 51 


22. Cholesterol The cholesterol levels of a sample of 10 female employees 
154 240 171 188 235 203 184 173 181 275 


23. Ports of Entry The maximum numbers of passenger vehicle lanes at 
16 Canadian border ports of entry (Source: U.S. Customs and Border Protection) 


8 6 10 3 6 11 17 2 
2 6 1 10 3 19 10 5 


7 24. Power Failures The durations (in minutes) of power failures at a 
residence in the last 10 years 


18 26 45 75 125 80 33 
40 44 49 89 80 96 125 
12 61 31 63 103 28 19 


25. Treatment of Depression The numbers of patients who responded 
to various combinations of electroconvulsive therapy, medication, and 
cognitive-behavioral therapy to treat acute depression over different time 
periods (Source: Adapted from Bipolar Network News) 


42 15 8 9 13 6 7 


Be 26. Number One Movies The numbers of weeks the 33 leading movies 
remained at number 1 as of March 2018. (Source: BoxOfficeMojo) 


15 7 8 6 14 5 14 4 5 4 11 8 
7 4 12 6 10 9 4 5 5 5 14 5 
44 59 65 4 6 5 


98 CHAPTER 2 _ Descriptive Statistics 


27. Online Shopping The responses of a sample of 5330 shoppers who were 
asked how their purchases are made are shown in the table at the left. 


(Adapted from UPS) 


How purchases Frequency, 
are made 


Research online and 1173 


in store, buy in store 28. Criminal Justice The responses of a sample of 34 young adult United 
Search and buy 7038 Kingdom males in custodial sentences who were asked what is affected by 
online such sentences (Adapted from User Voice) 
Search and buy 1066 Mental health: 8 
in store 
Trust: 3 
asi Raa 853 Education: 8 
a Personal development: 5 
TABLE FOR EXERCISE 27 Family: 3 7 
Future opportunities: 3 
Small Businesses Other: 4 
Yes, since 2015 29. Class Level The class levels of 25 students in a physics course 
46 ‘ 
ae Freshman: 2 Junior: 10 
eee No, but plan Sophomore: 5 Senior: 8 
2014 or a to in 2016 P 
earlier 47S 60 30. Small Business Websites The pie chart at the left shows the responses of a 
Ue sample of 352 small-business owners who were asked whether their business 
No, but likely has a website. (Source: Clutch) 
in 2017 or later . : F 
25 31. Weights (in pounds) of 32. Grade Point Averages of 
No, neither = : 
likely nor unlikely No, unlikely Packages on a Delivery Truck Students in a Class 
inthe future —_ in the future 0/58 Key: 3|0 = 30 0|8 Key: 0/8 = 0.8 
o - 1/0136 1/5 6 8 
FIGURE FOR EXERCISE 30 211333677 2/13 4 5 
3/012444578 3)/0 9 
4134569 4/0 0 
5 |2 
33. Times (in minutes) It Takes 34. Prices (in dollars) of 
Employees to Drive to Work Flights from Chicago to Alanta 
MEE H4- e e e 
5 10 15 20 25 30 35 40 
SS SS 


Graphical Analysis Jn Exercises 35 and 36, identify any clusters, gaps, or outliers. 


35. Model Year 2017 36. Model Year 2017 
Ethanol Flexible Fuel Vehicles Hybrid Electric Cars 


A 


NY 
o 
t 


Frequency 

ae 
han DD 
i i i i 


Frequency 


250 300 350 400 450 500 550 600 


Driving range (in miles) 


S HB GO _H 
OSS) 


: ; OD DOO 
(Source: United States Environmental SHE SS 


Protection Agency) Annual fuel cost (in dollars) 


(Source: Based on United States 
Environmental Protection Agency) 


SECTION 2.3 Measures of Central Tendency 99 


In Exercises 37-40, without performing any calculations, determine which measure 
of central tendency best represents the graphed data. Explain your reasoning. 


37. 


39. 


How Often 38. Heights of Players on Two 
Do You Change Jobs? Opposing Volleyball Teams 
Pa 
g e 
: : 
g z. 
a e 
ca 
A na a i 
SK & Phe 70 71 72 73 74 75 76 77 
& a & oye Height (in inches) 
Response 
(Source: Jobvite) 
Heart Rates of a Sample 40. Body Mass Indexes (BMI) 
of Adults of People in a Gym 


Frequency 
Frequency 


55 60 65 70 75 80 85 


Heart rate (in beats per minute) 


Finding a Weighted Mean Jn Exercises 41-46, find the weighted mean of 
the data. 


41. 


42. 


43. 


AA, 


Final Grade The scores and their percents of the final grade for a biology 
student are shown below. What is the student’s mean score? 


Score Percent of Final Grade 
Assignment 75 10% 
Class Participation 60 25% 
Practical 90 25% 
Theory Exam 85 40% 


Final Grade The scores and their percents of the final grade for a statistics 
student are shown below. What is the student’s mean score? 
Score Percent of Final Grade 
Assignment 75 10% 
Class Participation 60 25% 
Practical 90 25% 
Theory Exam 85 40% 


Account Balance For the month of April, a checking account has a balance 
of $523 for 24 days, $2415 for 2 days, and $250 for 4 days. What is the 
account’s mean daily balance for April? 


Credit Card Balance For the month of October, a credit card has a 
balance of $115.63 for 12 days, $637.19 for 6 days, $1225.06 for 7 days, $0 for 
2 days, and $34.88 for 4 days. What is the account’s mean daily balance 
for October? 


100 


CHAPTER 2 _ Descriptive Statistics 


45. Scores The mean scores for students in a statistics course (by major) are 
shown below. What is the mean score for the class? 


9 engineering majors: 85 5S math majors:90 13 business majors: 81 


46. Grades A student receives the grades shown below, with an A worth 
4 points, a B worth 3 points, a C worth 2 points, and a D worth 1 point. What 
is the student’s grade point average? 


A in 1 four-credit class Cin 1 three-credit class 
B in 2 three-credit classes D in 1 two-credit class 
47. Final Grade In Exercise 41, an error was made in grading your practical. 
Instead of getting 90, you scored 100. What is your new weighted mean? 


48. Grades In Exercise 46, one of the student’s B grades gets changed to an A. 
What is the student’s new grade point average? 


Finding the Mean of a Frequency Distribution Jn Exercises 49-52, 
approximate the mean of the frequency distribution. 


49. Car Speeds The optimum speeds 50. Car Speeds The optimum speeds 


(in kilometers per hour) for 30 (in kilometers per hour) for 30 
hatchbacks hatchbacks 
Car Speeds Car Speeds 
(in kilometers Frequency (in kilometers Frequency 
per hour) per hour) 
20-24 15 20-24 8 
25-29 8 25-29 16 
30-34 4 30-34 5 
35-39 3 35-39 1 
51. Ages The ages (in years) of the 52. Populations The populations 
residents of a small town in 2012 (in thousands) of the parishes of 
Age fe Louisiana in 2015 (Source: U.S. 
(in years) quency Census Bureau) 
0-9 40 Population Frequency 
10-19 72 (in thousands) 
20-29 78 ais we 
30-39 90 one ? 
40-49 84 100-149 6 
50-59 42 150-199 2 
60-69 31 200-249 1 
70-79 oy) 250-299 2 
80-89 18 300-349 0 
90-99 A 350-399 1 
400-449 2 


Identifying the Shape of a Distribution Jn Exercises 53-56, construct 
a frequency distribution and a frequency histogram for the data set using the 
indicated number of classes. Describe the shape of the histogram as symmetric, 
uniform, negatively skewed, positively skewed, or none of these. 


lad} 53. Hospital Beds 
Number of classes: 5 


Data set: The number of beds in a sample of 20 hospitals 


167 162 127 130 180 160 167 221 145 137 
194. 207 150 254 262 244 297 137 204 180 


US. trade deficits 

(in billions of dollars) 
China: 367.2 Germany: 74.8 
Japan: 68.9 Mexico: 60.7 
Vietnam: 30.9 Ireland: 30.4 
South Korea: 28.3 Italy: 28.0 
India: 23.3 Malaysia: 21.7 
France: 17.7 Thailand: 17.4 
Canada: 15.5 Taiwan: 15.0 
Indonesia: 12.5 Israel: 10.9 


Russian Federation: 9.3 
Switzerland: 9.2 


TABLE FOR EXERCISE 58 


SECTION 2.3 Measures of Central Tendency 101 


lad} 54. Emergency Room 
Number of classes: 6 


Data set: The numbers of patients visiting an emergency room per day 
over a two-week period 


256 317 237 182 382 106 162 
112 162 264 104 194 236 227 


ad} 55. Weights of Females 
Number of classes: 5 
Data set: The weights (to the nearest kilograms) of 30 females 


46 48 40 58 56 60 42 43 49 52 
63 44 49 51 42 47 69 65 44 53 
50 51 47 41 64 62 54 49 70 68 


Be 56. Six-Sided Die 
Number of classes: 6 
Data set: The results of rolling a six-sided die 30 times 


14615 3 25 4 6 

1243 563 2141 

5 62443 162 4 

57. Cement During a quality assurance check, the actual weights (in kilograms) 
of eight sacks of cement were recorded as 20.5, 19.4, 19.6, 18.0, 21.0, 20.2, 


20.4, and 20.9. 


(a) Find the mean and the median of the contents. 
(b) The fifth value was incorrectly measured and is actually 20.0. Find the 
mean and the median of the contents again. 


(c) Which measure of central tendency, the mean or the median, was 
affected more by the data entry error? 


58. U.S. Trade Deficits The table at the left shows the U.S. trade deficits (in 
billions of dollars) with 18 countries in 2015. (Source: U.S. Department of 
Commerce) 


(a) Find the mean and the median of the trade deficits. 

(b) Find the mean and the median without the Chinese trade deficit. Which 
measure of central tendency, the mean or the median, was affected more 
by the elimination of the Chinese trade deficit? 

(c) The Austrian trade deficit was $7.3 billion. Find the mean and the 
median with the Austrian trade deficit added to the original data set. 
Which measure of central tendency was affected more? 


Graphical Analysis Jn Exercises 59 and 60, the letters A, B, and C are marked 
on the horizontal axis. Describe the shape of the data. Then determine which is the 
mean, which is the median, and which is the mode. Justify your answers. 


59. Sick Days Used by Employees 60. Hourly Wages of Employees 


Frequency 
Frequency 
loo} 
t 


104 + ha 16 18 20 22 24 26 28 4 16 18 20 » 4 264" 430432 
Hourly wage A B C 


ABC Number of days 


102 


Run 1 
Run 2 
Run 3 
Run 4 
Run 5 


TABLE FOR EXERCISE 63 


TABLE FOR EXERCISE 65 


CHAPTER 2 _ Descriptive Statistics 


34 


31 


30 


Test scores 


64 37 
51 72 
36 41 


Extending Concepts 


61. 


62. 


63. 


64. 


66. 


Writing In an academic year, a student receives the grades shown below, 
with an A worth 4 points, a B worth 3 points, and a C worth 2 points. 


A in 2 four-credit classes and 3 three-credit classes 

B in 2 three-credit classes and 2 two-credit classes 

C in 1 two-credit class 
The student can increase one of the Bs or Cs by one letter grade. Which one 
should the student choose? Explain your reasoning. 


Golf The distances (in yards) for nine holes of a golf course are listed. 
336 393 408 522 147 504 177 375 360 


(a) Find the mean and the median of the data. 
(b) Convert the distances to feet. Then rework part (a). 


(c) Compare the measures you found in part (b) with those found in 
part (a). What do you notice? 


(d) Use your results from part (c) to explain how to quickly find the mean and the 
median of the original data set when the distances are converted to inches. 


Data Analysis A consumer testing service obtained the gas mileages (in 
miles per gallon) shown in the table at the left in five test runs performed 
with three types of compact cars. 


(a) The manufacturer of Car A wants to advertise that its car performed 
best in this test. Which measure of central tendency—mean, median, or 
mode—should be used for its claim? Explain your reasoning. 


(b) The manufacturer of Car B wants to advertise that its car performed 
best in this test. Which measure of central tendency—mean, median, or 
mode—should be used for its claim? Explain your reasoning. 

(c) The manufacturer of Car C wants to advertise that its car performed 
best in this test. Which measure of central tendency—mean, median, or 
mode—should be used for its claim? Explain your reasoning. 


Midrange Another measure of central tendency, which is rarely used, is the 
midrange. It can be found by using the formula 


(Maximum data entry) + (Minimum data entry) 
5 “ 


Which of the manufacturers in Exercise 63 would prefer to use the midrange 
statistic in their ads? Explain your reasoning. 


Midrange = 


By 65. Data Analysis Students in an experimental psychology class did 


research on depression as a sign of stress. A test was administered to 
a sample of 30 students. The scores are shown in the table at the left. 
(a) Find the mean and the median of the data. 


(b) Draw a stem-and-leaf plot for the data using one row per stem. 
Locate the mean and the median on the display. 


(c) Describe the shape of the distribution. 
Trimmed Mean _ To find the 10% trimmed mean of a data set, order the 
data, delete the lowest 10% of the entries and the highest 10% of the entries, 
and find the mean of the remaining entries. 
(a) Find the 10% trimmed mean for the data in Exercise 65. 
(b) Compare the four measures of central tendency, including the midrange. 


(c) What is the benefit of using a trimmed mean versus using a mean found 
using all data entries? Explain your reasoning. 


ACTIVITY 


You can find the interactive 
applet for this activity 


Tar 


APPLET 


within MyLab Statistics or at 
www.pearsonglobaleditions 


.com. 


APPLET 


Mean Versus Median 


The mean versus median applet is designed to allow you to investigate 
interactively the mean and the median as measures of the center of a data set. 
Points can be added to the plot by clicking the mouse above the horizontal axis. 
The mean of the points is shown as a green arrow and the median is shown as a 
red arrow. When the two values are the same, a single yellow arrow is displayed. 
Numeric values for the mean and the median are shown above the plot. Points 
on the plot can be removed by clicking on the point and then dragging the 
point into the trash can. All of the points on the plot can be removed by simply 
clicking inside the trash can. The range of values for the horizontal axis can be 
specified by inputting lower and upper limits and then clicking UPDATE. 


Mean: Median: 


Update | 


pt 
a 
n 
ioe} 


Ne} 


Lower Limit: | 1 Upper Limit: 


EXPLORE 


Step 1 Specify a lower limit. 

Step 2 Specify an upper limit. 

Step 3 Add 15 points to the plot. 

Step 4 Remove all of the points from the plot. 


DRAW CONCLUSIONS 


1. Specify the lower limit to be 1 and the upper limit to be 50. Add at least 10 
points that range from 20 to 40 so that the mean and the median are the 
same. What is the shape of the distribution? What happens at first to the 
mean and the median when you add a few points that are less than 10? What 
happens over time as you continue to add points that are less than 10? 


2. Specify the lower limit to be 0 and the upper limit to be 0.75. Place 10 
points on the plot. Then change the upper limit to 25. Add 10 more points 
that are greater than 20 to the plot. Can the mean be any one of the points 
that were plotted? Can the median be any one of the points that were 
plotted? Explain. 


SECTION 2.3 Measures of Central Tendency 103 


104 CHAPTER 2. Descriptive Statistics 


What You Should Learn 


» How to find the range of a data 
set 


» How to find the variance 
and standard deviation of a 
population and of a sample 


~ How to use the Empirical Rule 
and Chebychev’s Theorem to 
interpret standard deviation 

~ How to estimate the sample 
standard deviation for grouped 
data 

~ How to use the coefficient of 
variation to compare variation 
in different data sets 


Corporation A 


Frequency 
i 
t 


1+ = 


|} | __. 
25.55 315 375 43.5 49.5 55.5 


Starting salary (in thousands of dollars) 


Corporation B 


Frequency 
& 
t 


Tt nl | 

| | | | | 

, a 
25.55 315 375 43.5 49.5 55.5 


Starting salary (in thousands of dollars) 


yaCS \leasures of Variation 


Range m Variance and Standard Deviation m= Interpreting Standard Deviation 
m Standard Deviation for Grouped Data m Coefficient of Variation 


Range 


In this section, you will learn different ways to measure the variation (or spread) 
of a data set. The simplest measure is the range of the set. 


DEFINITION 


The range of a data set is the difference between the maximum and minimum 


data entries in the set. To find the range, the data must be quantitative. 


Range = (Maximum data entry) — (Minimum data entry) 


Finding the Range of a Data Set 


Two corporations each hired 10 graduates. The starting salaries for each 
graduate are shown. Find the range of the starting salaries for Corporation A. 


Starting Salaries for Corporation A (in thousands of dollars) 


Salary 41 38 39 45 47 41 44 41 = 37° 42 


Starting Salaries for Corporation B (in thousands of dollars) 


Salary 40 23 41 50 49 32 41 | 29 52 58 


SOLUTION 
Ordering the data helps to find the least and greatest salaries. 


37 38 39 41 41 41 42 44 45 47 


Minimum Maximum 


Range = (Maximum salary) — (Minimum salary) 
= 47 — 37 
= 10 
So, the range of the starting salaries for Corporation A is 10, or $10,000. 


TRY IT YOURSELF 1 


Find the range of the starting salaries for Corporation B. Compare the result 
to the one in Example 1. Answer: Page A33 


Both data sets in Example 1 have a mean of 41.5, or $41,500, a median of 41, 
or $41,000, and a mode of 41, or $41,000. And yet the two sets differ significantly. 
The difference is that the entries in the second set have greater variation. As you 
can see in the figures at the left, the starting salaries for Corporation B are more 
spread out than those for Corporation A. 


SECTION 2.4 Measures of Variation 105 


Variance and Standard Deviation 


As a measure of variation, the range has the advantage of being easy to compute. 
Its disadvantage, however, is that it uses only two entries from the data set. Two 
measures of variation that use all the entries in a data set are the variance and the 
standard deviation. Before you learn about these measures of variation, you need 
to know what is meant by the deviation of an entry in a data set. 


DEFINITION 


The deviation of an entry x in a population data set is the difference between 


the entry and the mean yp of the data set. 


Deviation of x = x — pw 


Deviations of Starting Salaries 


: Consider the starting salaries for Corporation A in Example 1. The mean 
for Corporation A 


starting salary is ~ = 415/10 = 41.5, or $41,500. The table at the left lists the 


Salary Deviation deviation of each salary from the mean. For instance, the deviation of 41 is 
(in 1000s (in 1000s 41 — 41.5 = —0.5. Notice that the sum of the deviations is 0. In fact, the sum of 
of dollars) of dollars) the deviations for any data set is 0. So, it does not make sense to find the average 
x x— ph of the deviations. To overcome this problem, take the square of each deviation. 
41 -0.5 The sum of the squares of the deviations, or sum of squares, is denoted by SS,. 
38 35 In a population data set, the average of the squares of the deviations is the 
39 95 population variance. 
_ #0 DEFINITION 
47 5.5 
AY —~0.5 The population variance of a population data set of N entries is 
44 a5 4 S(x-p)? 
A 05 Population variance = 07 = Ww 
37 —4.5 : : 
46 aie The symbol a is the lowercase Greek letter sigma. 


Xx = 415 | Y(x- =0 begs é 3 : : ‘ 
ed eS) As a measure of variation, one disadvantage with the variance is that its 


The sum of the A units are different from the data set. For instance, the variance for the starting 

pene eM salaries (in thousands of dollars) in Example 1 is measured in “square thousands 
of dollars.” To overcome this problem, take the square root of the variance to 
get the standard deviation. 


DEFINITION 


The population standard deviation of a population data set of N entries is the 
square root of the population variance. 


a(x fe) 
N 


Population standard deviation = 0 = Vor = 


Here are some observations about the standard deviation. 


e The standard deviation measures the variation of the data set about the mean 
and has the same units of measure as the data set. 


e The standard deviation is always greater than or equal to 0. When o = 0, the 
data set has no variation and all entries have the same value. 


e As the entries get farther from the mean (that is, more spread out), the value 
of o increases. 


106 CHAPTER 2. Descriptive Statistics 


To find the variance and standard deviation of a population data set, use 
these guidelines. 


GUIDELINES 


Finding the Population Variance and Standard Deviation 
In Words In Symbols 


. Find the mean of the population data set. 


. Find the deviation of each entry. 


. Square each deviation. 


. Add to get the sum of squares. 


. Divide by N to get the population variance. 


. Find the square root of the variance to 
get the population standard deviation. 


Sum of Squares of Starting Salaries 
for Corporation A 


Salary Deviation Squares Finding the Population Variance and Standard Deviation 
x v— bp (ean). Find the population variance and standard deviation of the starting salaries for 
41 -~0.5 0.25 Corporation A listed in Example 1. 

38 —3.5 12.25 SOLUTION 
39 ~2.5 6.25 For this data set, N = 10 and }x = 415. The mean is 
45 3.9 12.25 
415 
47 5.5 30.25 a an 41.5. Mean 
41 —0.5 0.25 ‘ : 
i 58 poe The table at the left summarizes the steps used to find SS. Because 
Al “ips Woe SS, = 88.5 Sum of squares 
37 —4.5 20.25 you can find the variance and standard deviation as shown. 
= ue = pe 88.5 #5 86 Round to one more decimal place 
2x = 415 SS, = 88.5 10 . than the original data. 
_ 88.5 = Round to one more decimal place 
o = ,/—— = 3.0 ae 
10 than the original data. 


So, the population variance is about 8.9, and the population standard deviation 
is about 3.0, or $3000. 


TRY IT YOURSELF 2 


Study Tip Find the population variance and standard deviation of the starting salaries for 
Ninties thatthe variance Corporation B in Example 1. Answer: Page A33 
and standard deviation 
in Example 2 have one The formulas shown on the next page for the sample variance s” and 
more decimal place than sample standard deviation s of a sample data set differ slightly from those of a 
the original set of data population. For instance, to find s, the formula uses x. Also, SS, is divided by 
entries. This is the same n — 1. Why divide by one less than the number of entries? In many cases, a 
rouna-off rule that was used to statistic is calculated to estimate the corresponding parameter, such as using X to 
calculate the mean. estimate p. Statistical theory has shown that the best estimates of a? and o are 


obtained when dividing SS, by n — 1 in the formulas for s* and s. 


Symbols in Variance and Standard 
Deviation Formulas 


Variance 


Standard 
deviation 


Mean 


Number 
of entries 


Deviation 


Sum of 
squares 


Time 
ie 


omont nan 


Population 


o 


Deviation 
X—X 
—3.5 
—0.5 
—1.5 
—0.5 

1.5 
=2:5 
0.5 
2.5 
5: 
0.5 
—0.5 
25 


Sample 


Squares 
(x — x) 
12.25 
0.25 
225 
0.25 
2.25 
6.25 
0.25 
6.25 
2.25 
0.25 
0.25 
6.25 


SECTION 2.4 Measures of Variation 107 


DEFINITION 


The sample variance and sample standard deviation of a sample data set of 
n entries are listed below. 


> (x — x)? 


n-1 


Sample variance = s? = 


x(x — x)? 
n-1 


Sample standard deviation = s = Vs = 


GUIDELINES 


Finding the Sample Variance and Standard Deviation 
In Words In Symbols 


. Find the mean of the sample data set. 


. Find the deviation of each entry. 
. Square each deviation. 
. Add to get the sum of squares. 


. Divide by n — 1 to get the sample variance. 


. Find the square root of the variance to 
get the sample standard deviation. 


See Minitab and TI-84 Plus 
steps on pages 146 and 147. 


Finding the Sample Variance and Standard Deviation 


In a study of high school football players that suffered concussions, researchers 
placed the players in two groups. Players that recovered from their concussions 
in 14 days or less were placed in Group 1. Those that took more than 14 days 
were placed in Group 2. The recovery times (in days) for Group 1 are listed 
below. Find the sample variance and standard deviation of the recovery times. 
(Adapted from The American Journal of Sports Medicine) 


47679 5 8 10 9 8 7 10 


SOLUTION 


For this data set, n = 12 and }x = 90. The mean is x = 90/12 = 7.5. To 
calculate s* and s, note thatn — 1 = 12 —1= 11. 


SS, = 39 Sum of squares (see table at left) 
39 
<— TT = 3.5 Sample variance (divide SS, by n — 1) 
39 ee 
= TT = 19 Sample standard deviation 


So, the sample variance is about 3.5, and the sample standard deviation is 
about 1.9 days. 


108 CHAPTER 2. Descriptive Statistics 


Office rental rates 


51 30 15 
47 14 87 
33 11 35 
74 42 51 
24 40 26 
36 22 40 
41 35 36 
42 29 24 


TRY IT YOURSELF 3 


Refer to the study in Example 3. The recovery times (in days) for Group 2 are listed 
below. Find the sample variance and standard deviation of the recovery times. 


43 57 18 45 47 33 49 24 


Answer: Page A33 


Using Technology to Find the Standard Deviation 

Sample office rental rates (in dollars per square foot per year) for Los Angeles 
are shown in the table at the left. Use technology to find the mean rental rate 
and the sample standard deviation. (Adapted from LoopNet.com) 


SOLUTION 


Minitab, Excel, and the TI-84 Plus each have features that calculate the means 
and the standard deviations of data sets. Try using this technology to find the 
mean and the standard deviation of the office rental rates. From the displays, 


you can see that 


x =~ 36.9ands = 17.4. 


MINITAB 


Descriptive Statistics: Office Rental Rates 


Variable N Mean \ SE Mean 
Rental Rates 24 36.88 


Variable Q1 = Median 
Rental Rates 24.50 Sorow) 


3.55 


= 


1\Mean | S836. 875) 
2 | Standard Error 3.550011, 
3 | Median E 35.5 
4|Mode —SST. 
5 |(Standard Deviation | 17.39143) 
6 | Sample Variance 302.462 
7 | Kurtosis _ 2.954212 
8 | Skewness_ | 1.214477 
9 | Range _ ~_ —-— 76! 
10} Minimum | } 11) 
11 | Maximum. | B7 
12/ Sum _ 885° 
13] Count 24 


TRY IT YOURSELF 4 


Minimum 
11.00 


Q3 Maximum 
42.00 87.00 


TI-84 PLUS 


1-Var Stats 
>x=885 
>x°=39591 
oxX=17.02525697 
Wn=24 


Sample Mean 
Sample Standard Deviation 


Sample office rental rates (in dollars per square foot per year) for Dallas are 
listed. Use technology to find the mean rental rate and the sample standard 


deviation. (Adapted from LoopNet.com) 
18 27 21 14 20 20 24 11 
16 7 12 22 10 15 21 34 
23 13 38 16 18 30 15 30 


Answer: Page A33 


- Study Tip 
You can use standard 
deviation to compare 
variation in data sets that 
use the same units of 
measure and have means 
that are about the same. 
For instance, in the data sets with 
X = 5 shown at the right, the data 
set with s ~ 3.0 is more spread out 
than the other data sets. Not all data 
sets, however, use the same units 
of measure or have approximately 
equal means. To compare variation in 
these data sets, use the coefficient 
of variation, which is discussed later 
in this section. 


To explore this topic further, 
see Activity 2.4 on page 122. 


Entry Deviation Squares 


x x-p | (w- pn) 
1 3 9 
3 +1 1 
5 1 
i 3 9 


SECTION 2.4 Measures of Variation 109 


Interpreting Standard Deviation 


When interpreting the standard deviation, remember that it is a measure of the 
typical amount an entry deviates from the mean. The more the entries are spread 
out, the greater the standard deviation. 


A A A 
8+ 8+ 8+ 
T+ = 7+ = T+ = 
> 6+ x=5 > ot x=5 og al x=5 
eal s=0 eal sx12 asl s~3.0 
Es 3 5 
al cal aD 
& 3+ & 34 © 37 
2+ 24 2+ 
1+ 14 1+ 
t= aaa t iit fe 
123456789 123456789 123456789 
Data entry Data entry Data entry 


Estimating Standard Deviation 
Without calculating, estimate the population standard deviation of each data set. 


1. 2; A 3. h 


s+ s+ 
7+/N=8 7+4N 
w=4 H 


8 
4 


oil 


6 6+ 
5+ 5+ 
4+ 4+ 
3+ 3-¢ 
2+ 2+ 
1+ 1+ 
t tt t tt 
2.3 012345 67 012345 67 
Data entry Data entry Data entry 


SOLUTION 
1. Each of the eight entries is 4. The deviation of each entry is 0, so 


Frequency 
PNW UADANA 
Frequency 
Frequency 


> 


o = 0. Standard deviation 


2. Each of the eight entries has a deviation of +1. So, the population standard 
deviation should be 1. By calculating, you can see that 


o = 1. Standard deviation 


3. Each of the eight entries has a deviation of +1 or +3. So, the population 
standard deviation should be about 2. By calculating, you can see that o is 
greater than 2, with 


o = 2.2. Standard deviation 


TRY IT YOURSELF 5 


Write a data set that has 10 entries, a mean of 10, and a population standard 


deviation that is approximately 3. (There are many correct answers.) 
Answer: Page A33 


Data entries that lie more than two standard deviations from the mean are 
considered unusual, while those that lie more than three standard deviations 
from the mean are very unusual. Unusual and very unusual entries have a greater 
influence on the standard deviation than entries closer to the mean. This happens 
because the deviations are squared. Consider the data entries from Example 5, 
part 3 (see table at the left). The squares of the deviations of the entries farther 
from the mean (1 and 7) have a greater influence on the value of the standard 
deviation than those closer to the mean (3 and 5). 


110 CHAPTER 2. Descriptive Statistics 


Ree 
=) Picturing 
the World 


A survey was conducted by 

the National Center for Health 
Statistics to find the mean height 
of males in the United States. The 
histogram shows the distribution 
of heights for the sample of 

men examined in the 20-29 age 
group. In this group, the mean 
was 69.4 inches and the standard 
deviation was 2.9 inches. (Adapted 
from National Center for Health Statistics) 


Heights of Men in the U.S. 
Ages 20-29 


Co 
CC 


Relative frequency 
(in percent) 
S 


64 66 68 70 72 74 76 
Height (in inches) 


Estimate which two heights 
contain the middle 95% of the 
data. The height of a twenty-five- 
year-old male is 74 inches. Is this 
height unusual? Why or why not? 


Heights of Women in the U.S. 
Ages 20-29 


671 70.0 72.9 
{ B42 | 
x—-3s x-Ss X+5 X+3s 


Height (in inches) 


Many real-life data sets have distributions that are approximately symmetric 
and bell-shaped (see figure below). For instance, the distributions of men’s 
and women’s heights in the United States are approximately symmetric and 
bell-shaped (see the figures at the left and bottom left). Later in the text, you 
will study bell-shaped distributions in greater detail. For now, however, the 
Empirical Rule can help you see how valuable the standard deviation can be as 
a measure of variation. 


Bell-Shaped Distribution 


3 standard deviations 


<< About 95% within ————>! 
2 standard deviations 


About 
~ 68% within ~" 
\ 1 standard \ 
deviation 
1 1 
I 1 
1 1 
1 i 
i 1 


Empirical Rule (or 68-95-99.7 Rule) 

For data sets with distributions that are approximately symmetric and 
bell-shaped (see figure above), the standard deviation has these characteristics. 
1. About 68% of the data lie within one standard deviation of the mean. 

2. About 95% of the data lie within two standard deviations of the mean. 

3. About 99.7% of the data lie within three standard deviations of the mean. 


Using the Empirical Rule 


In a survey conducted by the National Center for Health Statistics, the sample 
mean height of women in the United States (ages 20-29) was 64.2 inches, with 
a sample standard deviation of 2.9 inches. Estimate the percent of women 
whose heights are between 58.4 inches and 64.2 inches. (Adapted from National 
Center for Health Statistics) 


SOLUTION 

The distribution of women’s heights is shown at the left. Because the distribution 
is bell-shaped, you can use the Empirical Rule. The mean height is 64.2, so when 
you subtract two standard deviations from the mean height, you get 

X — 2s = 64.2 — 2(2.9) = 58.4. 

Because 58.4 is two standard deviations below the mean height, the percent of 
the heights between 58.4 and 64.2 inches is about 13.59% + 34.13% = 47.72%. 
Interpretation So, about 47.72% of women are between 58.4 and 64.2 inches tall. 


TRY IT YOURSELF 6 


Estimate the percent of women ages 20-29 whose heights are between 
64.2 inches and 67.1 inches. Answer: Page A33 


Study Tip 


In Example 7, Chebychev's 
Theorem gives you an 
inequality statement that 
says at least 75% of the 
population of Georgia is 
under the age of 81.9. 

This is a true statement, but it is not 
nearly as strong a statement as could 
be made from reading the histogram. 


In general, Chebychev’s Theorem 
gives the minimum percent of data 
entries that fall within the given 
number of standard deviations of the 
mean. Depending on the distribution, 
there is probably a higher percent of 
data falling in the given range. 


SECTION 2.4 Measures of Variation 111 


The Empirical Rule applies only to (symmetric) bell-shaped distributions. 
What if the distribution is not bell-shaped, or what if the shape of the distribution 
is not known? The next theorem gives an inequality statement that applies to 
all distributions. It is named after the Russian statistician Pafnuti Chebychev 
(1821-1894). 


Chebychev’s Theorem 


The portion of any data set lying within k standard deviations (k > 1) of the 
mean is at least 


1 3 ; 
e k = 2: In any data set, at least 1 — 7) = —, or 75%, of the data lie within 


4 
2 standard deviations of the mean. 


1 8 
e k = 3: In any data set, at least 1 — Py = 9° or about 88.9%, of the data 


lie within 3 standard deviations of the mean. 


Using Chebychev’s Theorem 
The age distributions for Georgia and Iowa are shown in the histograms. 
Apply Chebychev’s Theorem to the data for Georgia using k = 2. What 
can you conclude? Is an age of 100 unusual for a Georgia resident? Explain. 
(Source: Based on U.S. Census Bureau) 


Georgia Iowa 


1500 


1200 


900 


600 


300 


Population (in thousands) 
Population (in thousands) 


5S 15 25 35 45 55 65 75 85 
Age (in years) Age (in years) 


5 15 25 35 45 55 65 75 .85 


SOLUTION 


The histogram on the left shows Georgia’s age distribution. Moving two 
standard deviations to the left of the mean puts you below 0, because 
pb — 20 ~ 37.3 — 2(22.3) = —7.3. Moving two standard deviations to the 
right of the mean puts you at 


p+ 20 ~ 373 + 2(22.3) = 81.9. 


By Chebychev’s Theorem, you can say that at least 75% of the population of 
Georgia is between 0 and 81.9 years old. Also, because 100 > 81.9, an age of 100 
lies more than two standard deviations from the mean. So, this age is unusual. 


TRY IT YOURSELF 7 


Apply Chebychev’s Theorem to the data for lowa using k = 2. What can you 
conclude? Is an age of 80 unusual for an Iowa resident? Explain. 
Answer: Page A33 


112 CHAPTER 2. Descriptive Statistics 


Standard Deviation for Grouped Data 


In Section 2.1, you learned that large data sets are usually best represented by 
frequency distributions. The formula for the sample standard deviation for a 
frequency distribution is 


Study Tip 


Remember that 

formulas for grouped 
data require you to multiply by the 
frequencies. 


X(x — x)’*f 
ig= il 


Sample standard deviation = 5 = 


where n = Sf is the number of entries in the data set. 


EXAMPLE 8 


Finding the Standard Deviation for Grouped Data 


You collect a random sample of the number of children per household in a 
region. The results are listed below. Find the sample mean and the sample 
standard deviation of the data set. 


13 1721721212 2«21 
0363 03 1 1 


0110001 
1 60 13 6 6 
223 011421 220 3 02 4 


1 
1 
SOLUTION 

These data could be treated as 50 individual entries, and you could use 


the formulas for mean and standard deviation. Because there are so many 
repeated numbers, however, it is easier to use a frequency distribution. 


x f af Se Ne = a) ae = ye 
0 10 0 =1,.82 3.3124 33.1240 
iL 19 19 —0.82 0.6724 12.7756 
2 ie 14 0.18 0.0324 0.2268 
3 7 21 1.18 1.3924 9.7468 
4 2 8 2.18 4.7524 9.5048 
3 1 5 3.18 10.1124 10.1124 
6 4 24 4.18 17.4724 69.8896 

L= a > = 145.38 

x= >t = - = 182 ~ 18 Sample mean 


Use the sum of squares to find the sample standard deviation. 


X(x — x)°*f 145.38 _ 
s= = = 17 Sample standard deviation 
n-1 49 


So, the sample mean is about 1.8 children, and the sample standard deviation 
is about 1.7 children. 


TRY IT YOURSELF 8 


Change three of the 6’s in the data set to 4’s. How does this change affect the 
sample mean and sample standard deviation? 
Answer: Page A33 


SECTION 2.4 Measures of Variation 113 


When a frequency distribution has classes, you can estimate the sample 
mean and the sample standard deviation by using the midpoint of each class. 


Using Midpoints of Classes 


The figure below shows the results of a survey in which 1000 adults were asked 
how much they spend in preparation for personal travel each year. Make a 
frequency distribution for the data. Then use the table to estimate the sample 
mean and the sample standard deviation of the data set. (Adapted from Travel 
Industry Association of America) 


$100 — $199 
230 


SOLUTION 


Begin by using a frequency distribution to organize the data. Because the 
class of $500 or more is open-ended, you must choose a value to represent the 
midpoint, such as 599.5. 


Class x f xf = 2 | (@opyY | Go Rey 
0-99 49.5 380 18,810 —142.5 20,306.25 | 7,716,375.0 
100-199 | 149.5 230 34,385 —42.5 1,806.25 415,437.5 
200-299 | 249.5 210 52,395 57.5 3,306.25 694,312.5 
300-399 | 349.5 50 17,475 157.5 24,806.25 1,240,312.5 
400-499 449.5 60 26,970 257.5 66,306.25 | 3,978,375.0 
500+ | 599.5 70 41,965 407.5 166,056.25 | 11,623,937.5 
> = 1000 > = 192,000 x = 25,668,750.0 
x= 2st a cet - Sample mean 
n 1000 


Use the sum of squares to find the sample standard deviation. 


E(x = 2)" [25,668,750 
s= ( \f = ~—— = 160.3 Sample standard deviation 
n=. A: 999 


So, an estimate for the sample mean is $192 per year, and an estimate for the 
sample standard deviation is $160.30 per year. 


TRY IT YOURSELF 9 


In the frequency distribution in Example 9, 599.5 was chosen as the midpoint 
for the class of $500 or more. How does the sample mean and standard 
deviation change when the midpoint of this class is 650? 

Answer: Page A33 


114 CHAPTER 2. Descriptive Statistics 


Coefficient of Variation 


To compare variation in different data sets, you can use standard deviation when 
the data sets use the same units of measure and have means that are about the 
same. For data sets with different units of measure or different means, use the 
coefficient of variation. 


DEFINITION 


The coefficient of variation (CV) of a data set describes the standard 


deviation as a percent of the mean. 


Population: CV = 7 100% Sample: CV = =: 100% 


Note that the coefficient of variation measures the variation of a data set 
relative to the mean of the data. 


Comparing Variation in Different Data Sets 


The table below shows the population heights (in inches) and weights (in 
pounds) of the members of a basketball team. Find the coefficient of variation 
for the heights and the weights. Then compare the results. 


Heights and Weights of a Basketball Team 


Heights 72 74 68 | 76 74 | 69 | 72 79 | 70 | 69 | 77 | 73 
Weights 180 168 225 | 201 189 192 | 197 162 174) 171 185 | 210 


SOLUTION 


The mean height is ~ ~ 72.8 inches with a standard deviation of 0 ~ 3.3 inches. 
The coefficient of variation for the heights is 


Ca a 100% 


3.3 
= ——: 1009 
72.8 ae 
= 45%. 
The mean weight is « ~ 187.8 pounds with a standard deviation of 
o =~ 17.7 pounds. The coefficient of variation for the weights is 


Oo 
CV veiont = —* 100% 
ght mn 


a 
eae 100% 


= 94%, 
Interpretation The weights (9.4%) are more variable than the heights (4.5%). 


TRY IT YOURSELF 10 


Find the coefficient of variation for the office rental rates in Los Angeles 
(see Example 4) and for those in Dallas (see Try It Yourself 4). Then compare 
the results. 

Answer: Page A33 


24 EXERCISES 


SECTION 2.4 Measures of Variation 115 


For Extra Help: MyLab Statistics 


Building Basic Skills and Vocabulary 


1 


Explain how to find the range of a data set. What is an advantage of using 
the range as a measure of variation? What is a disadvantage? 


. Explain how to find the deviation of an entry in a data set. What is the sum 


of all the deviations in any data set? 


3. Why is the standard deviation used more frequently than the variance? 


. Explain the relationship between variance and standard deviation. Can 


either of these measures be negative? Explain. 


. Describe the difference between the calculation of population standard 


deviation and that of sample standard deviation. 


6. Given a data set, how do you know whether to calculate o or s? 


7. Discuss the similarities and the differences between the Empirical Rule and 


Chebychev’s Theorem. 


. What must you know about a data set before you can use the Empirical Rule? 


Using and Interpreting Concepts 


Finding the Range of a Data Set Jn Exercises 9 and 10, find the range of 
the data set represented by the graph. 


9. 


11. 


12. 


Finding Population Statistics 


Median Annual Income by State 10. 


Frequency 


40 45 50 55 60 65 70 75 
Income (in thousands of dollars) 


(Source: U.S. Census Bureau) 


Atmosphere The altitudes (in kilometers) of atmosphere at which helium 
is found in majority in 10 different cities are listed. 


938.5 927.0 929.5 930.3 934.3 936.0 926.2 930.5 924.8 870.7 


(a) Find the range of the data set. 
(b) Change 870.7 to 807.7 and find the range of the new data set. 


In Exercise 11, compare your answer to part (a) with your answer to part (b). 
How do outliers affect the range of a data set? 


In Exercises 13 and 14, find the range, 


mean, variance, and standard deviation of the population data set. 


13. 


Fire History The numbers of deaths caused by fire per year from 1990 to 
2005 in New South Wales (Source: Australian Institute of Criminology) 


8 13 2 11 5 5 14 1 63 13 3 
4 2 4 3 3 


116 


CHAPTER 2 _ Descriptive Statistics 


14. Density The densities (in kilograms per cubic meter) of the ten most 


abundant elements by weight in Earth’s crust 


1.4 2330 2700 7870 1500 
970 900 1740 4500 0.09 


Finding Sample Statistics Jn Exercises 15 and 16, find the range, mean, 
variance, and standard deviation of the sample data set. 


Be 15. Ages of Students The ages (in years) of a random sample of students 


in a campus dining hall 


18 18 19 15 16 18 18 19 21 21 
23 18 16 19 15 16 15 17 20 18 


7 16. Germination The durations (in days) of germination for a random 


17. 


18. 


19. 


sample of seeds. 


25 29 23 24 26 21 28 
29 25 26 24 28 26 25 
25 26 29 23 21 25 24 


Estimating Standard Deviation Both data sets shown in the histograms 
have a mean of 50. One has a standard deviation of 2.4, and the other has 
a standard deviation of 5. By looking at the histograms, which is which? 
Explain your reasoning. 


(a) (b) 


Frequency 
Frequency 


5+ 


42 45 48 51 54 57 60 42 45 48 S51 54 57 60 
Data entry Data entry 


Estimating Standard Deviation Both data sets shown in the stem-and-leaf 
plots have a mean of 165. One has a standard deviation of 16, and the other 
has a standard deviation of 24. By looking at the stem-and-leaf plots, which 
is which? Explain your reasoning. 


(a) 12}8 9 Key:12|8 =128 (b) 12 Key: 13|1 = 131 
13}5 5 8 13 | 1 
14}1 2 14/2 3 5 
15|0 0 6 7 15|}0 4 5 6 8 
16)/4 5 9 16};1 12 3 3 3 
17}1 3 6 8 17;1 5 8 8 
18)/0 8 9 18);2 3 4 5 
19 | 6 19|0 2 
20;3 5 7 20 
Salary Offers You are applying for jobs at two companies. Company A 


offers starting salaries with 4 = $30,000 and o = $4,000. Company B offers 
starting salaries with 4p = $30,000 and o = $2,000. From which company are 
you more likely to get an offer of $36,000 or more? Explain your reasoning. 


20. Salary Offers 


Graphical Analysis Jn Exercises 21-24, you are asked to compare three 
data sets. (a) Without calculating, determine which data set has the greatest 
sample standard deviation and which has the least sample standard deviation. 
Explain your reasoning. (b) How are the data sets the same? How do they differ? 
(c) Estimate the sample standard deviations. Then determine how close each of 


SECTION 2.4 Measures of Variation 


You are applying for jobs at two companies. Company C 
offers starting salaries with w = $75,000 and 0 = $2,500. Company D offers 
starting salaries with 4p = $75,000 and o = $5,000. From which company are 
you more likely to get an offer of $85,000 or more? Explain your reasoning. 


your estimates is by finding the sample standard deviations. 


21. (i) 


4567 8 9 10 
Data entry 


23. (i) 0 | 9 
Las 


2)}3377 


3 | os 
4/1 


Key: 1|5 = 15 


24. (i) 


123 45 67 8 


Constructing Data Sets Jn Exercises 25-28, construct a data set that has the 


given Statistics. 


27. 


(it) 


45 67 8 9 10 


Data entry 

(ii) 0) 9 

1] 5 

2/333777 

3) | 3 

4/1 

Key: 1|5 = 15 
(it) 


123 45 67 8 


26. 


28. 


N=9 
zE=8 
ao =~6 
n=5 
x=8 
sz=4 


(iii) 


(iii) 0 
1} 5 
2/333 37777 
3) 5 
4 
Key: 1|5 = 15 
(iii) 


123 45 67 8 


118 


CHAPTER 2 _ Descriptive Statistics 


Using the Empirical Rule /n Exercises 29-34, use the Empirical Rule. 


29. 


30. 


31. 


32. 


33. 


34. 


35. 


36. 


37. 


38. 


The mean speed of a sample of vehicles along a stretch of highway is 67 miles 
per hour, with a standard deviation of 4 miles per hour. Estimate the percent 
of vehicles whose speeds are between 63 miles per hour and 71 miles per 
hour. (Assume the data set has a bell-shaped distribution.) 


The mean monthly utility bill for a sample of households in a city is $70, with 
a standard deviation of $8. Between what two values do about 95% of the 
data lie? (Assume the data set has a bell-shaped distribution.) 


Use the sample statistics from Exercise 29 and assume the number of 
vehicles in the sample is 75. 


(a) Estimate the number of vehicles whose speeds are between 63 miles per 
hour and 71 miles per hour. 


n a sample o additional vehicles, about how many vehicles wou 

b) I ple of 25 additional vehicl bout h y vehicl Id 
you expect to have speeds between 63 miles per hour and 71 miles 
per hour? 


Use the sample statistics from Exercise 30 and assume the number of 
households in the sample is 40. 


(a) Estimate the number of households whose monthly utility bills are 
between $54 and $86. 


(b) In a sample of 20 additional households, about how many households 
would you expect to have monthly utility bills between $54 and $86? 


The speeds for eight vehicles are listed. Using the sample statistics from 
Exercise 29, determine which of the data entries are unusual. Are any of the 
data entries very unusual? Explain your reasoning. 


70, 78, 62, 71, 65, 76, 82, 64 


The monthly utility bills for eight households are listed. Using the sample 
statistics from Exercise 30, determine which of the data entries are unusual. 
Are any of the data entries very unusual? Explain your reasoning. 


$65, $52, $63, $83, $77, $98, $84, $70 


Using Chebychev’s Theorem You are conducting a survey on the number 
of people per house in your region. From a sample with n = 60, the mean 
number of people per house is 3 and the standard deviation is 1 person. Using 
Chebychev’s Theorem, determine at least how many of the households have 
0 to 6 people. 


Using Chebychev’s Theorem Old Faithful is a famous geyser at Yellowstone 
National Park. From a sample with m = 100, the mean interval between 
Old Faithful’s eruptions is 101.56 minutes and the standard deviation is 
42.69 minutes. Using Chebychev’s Theorem, determine at least how many 
of the intervals lasted between 16.18 minutes and 186.94 minutes. (Adapted 
from Geyser Times) 


Using Chebychev’s Theorem The mean height of students of a class is 125 
centimeters, with a standard deviation of 4 centimeters. Apply Chebychev’s 
Theorem to the data using k = 2. Interpret the results. 


Using Chebychev’s Theorem The mean number of runs per game scored 
by the Chicago Cubs during the 2016 World Series was 3.86 runs, with a 
standard deviation of 3.36 runs. Apply Chebychev’s Theorem to the data 
using k = 2. Interpret the results. (Adapted from Major League Baseball) 


SECTION 2.4 Measures of Variation 119 


BG Finding the Sample Mean and Standard Deviation for 
Grouped Data /n Exercises 39 and 40, make a frequency distribution for 
the data. Then use the table to find the sample mean and the sample standard 
deviation of the data set. 


Estimating the Sample Mean and Standard Deviation for Grouped 
Data In Exercises 41-44, make a frequency distribution for the data. Then use 
the table to estimate the sample mean and the sample standard deviation of the 
data set. 


41. College Expenses The distribution of the tuitions, fees, and room and board 
charges of a random sample of public 4-year degree-granting postsecondary 
institutions is shown in the pie chart. Use $26,249.50 as the midpoint for 
“$25,000 or more.” 


$25,000 or more 
$22.500-$24,999 $15,000-$17,499 30+ hours 0-4 hours 


$17,500-$19,999 ues 
$20,000-$22,499 15-19 hours 


FIGURE FOR EXERCISE 41 FIGURE FOR EXERCISE 42 


10-14 hours 


42. Weekly Study Hours The distribution of the numbers of hours that a 
random sample of college students study per week is shown in the pie chart. 
Use 32 as the midpoint for “30+ hours.” 


43. Teaching Load The numbers of courses taught per semester by a random 
sample of university professors are shown in the histogram. 


24 


Number of professors 


1 2 3 4 


Number of courses taught per semester 


44. Amounts of Caffeine The amounts of caffeine in a sample of five-ounce 
servings of brewed coffee are shown in the histogram. 


25 


Number of 5-ounce servings 


70.5 92.5 114.5 136.5 158.5 


Caffeine (in milligrams) 


120 


CHAPTER 2 _ Descriptive Statistics 


Comparing Variation in Different Data Sets Jn Exercises 45-50, find 
the coefficient of variation for each of the two data sets. Then compare the results. 


eB 45. Annual Salaries Sample annual salaries (in thousands of dollars) for 
entry level architects in Denver, CO, and Los Angeles, CA, are listed. 


Denver 45.8 464 444 40.7 51.5 39.5 
442 53.1 448 51.6 41.3 49.0 
Los Angeles 56.7 50.6 56.0 48.5 55.7 55.6 
47.6 563 481 46.3 51.9 61.2 


46. Wealth Sample wealth (in billions of dollars) for billionaires in Africa and 
Asia are listed. 
Africa 122 77 #72 68 #53 4 4 2.8 2.7 
Asia 33.3 24.8 242 22.7 21.1 21 20.2 20 19.1 
eB 47. Ages and Heights The ages (in years) and heights (in inches) of 


all members of the 2016 Women’s U.S. Olympic swimming team are 
listed. (Source: USA Swimming) 


Ages 24 24 19 23 22 21 21 24 19 19 19 
24 25 21 20 26 21 21 28 #30 19 21 
Heights 70 70 68 66 69 67 74 68 69 71 71 
68 67 70 75 73 74 69 73 73 70 71 


7% 48. Ages and Weights The ages (in years) and weight classes (in 
kilograms) of all members of the 2016 Men’s U.S. Olympic wrestling 
team are listed. (Source: U.S. Olympic Committee) 


Ages 24 29 26 29 29 28 21 30 27 20 
Weight Classes 59 75 85 130 57 74 86 125 65 97 


49. Sample Weight Averages Sample weight averages (in kilograms) for 10 
males and 10 females are listed. 


Males 70 72 75 69 64 75 60 71 73 72 
Females 65 66 68 61 64 69 65 63 62 65 
lad] 50. Sample Height Averages Sample height averages (in centimeters) for 
10 males and 10 females are listed. 
Males 167 165 160 181 190 175 178 164 168 155 
Females 145 170 138 140 151 171 142 151 153 157 


Extending Concepts 


51. Alternative Formula You used SS, = = (x — ¥)? when calculating variance 
and standard deviation. An alternative formula that is sometimes more 
convenient for hand calculations is 


(2x)? 
— yy2 _ 
SS, = Lx aa 


You can find the sample variance by dividing the sum of squares by n — 1 and 
the sample standard deviation by finding the square root of the sample variance. 
(a) Show how to obtain the alternative formula. 


(b) Use the alternative formula to calculate the sample standard deviation 
for the data set in Exercise 15. 


(c) Compare your result with the sample standard deviation obtained in 
Exercise 15. 


52. 


53. 


54. 


55. 


56. 


SECTION 2.4 Measures of Variation 121 


Mean Absolute Deviation Another useful measure of variation for a data 
set is the mean absolute deviation (MAD). It is calculated by the formula 


Six 
MAD = —. 
n 


(a) Find the mean absolute deviation of the data set in Exercise 15. 
Compare your result with the sample standard deviation obtained in 
Exercise 15. 


(b) Find the mean absolute deviation of the data set in Exercise 16. 
Compare your result with the sample standard deviation obtained in 
Exercise 16. 


Scaling Data Sample annual salaries (in thousands of dollars) for employees 
at a company are listed. 


42 36 48 51 39 39 42 
36 48 33 39 42 45 SO 
(a) Find the sample mean and the sample standard deviation. 


(b) Each employee in the sample receives a 5% raise. Find the sample mean 
and the sample standard deviation for the revised data set. 


(c) Find each monthly salary. Then find the sample mean and the sample 
standard deviation for the monthly salaries. 


(d) What can you conclude from the results of (a), (b), and (c)? 


Shifting Data Sample annual salaries (in thousands of dollars) for employees 
at a company are listed. 


40 35 49 53 38 39 40 
37 49 34 38 43 47 35 
(a) Find the sample mean and the sample standard deviation. 


(b) Each employee in the sample receives a $1000 raise. Find the sample 
mean and the sample standard deviation for the revised data set. 


(c) Each employee in the sample takes a pay cut of $2000 from their original 
salary. Find the sample mean and the sample standard deviation for the 
revised data set. 


(d) What can you conclude from the results of (a), (b), and (c)? 


Pearson’s Index of Skewness The English statistician Karl Pearson 
(1857-1936) introduced a formula for the skewness of a distribution. 


_ 3(x — median) 


; Pearson’s index of skewness 

Most distributions have an index of skewness between —3 and 3. When 
P > 0, the data are skewed right. When P < 0, the data are skewed left. 
When P = 0, the data are symmetric. Calculate the coefficient of skewness 
for each distribution. Describe the shape of each. 


(a) x = 17,5 = 2.3, median = 19 
(b) ¥ = 32,s = 5.1, median = 25 
(c) x = 9.2,5s = 1.8, median = 9.2 
(d) ¥ = 42,5 = 6.0, median = 40 
(e) x = 155, s = 20.0, median = 175 


Chebychev’s Theorem At least 99% of the data in any data set lie within 
how many standard deviations of the mean? Explain how you obtained your 
answer. 


Standard Deviation 


ACTIVITY 


The standard deviation applet is designed to allow you to investigate interactively 
the standard deviation as a measure of spread for a data set. Points can be added 


APPLET to the plot by clicking the mouse above the horizontal axis. The mean of the 
You can find the interactive points is shown as a green arrow. A numeric value for the standard deviation 
applet for this activity is shown above the plot. Points on the plot can be removed by clicking on the 


within MyLab Statistics or at 


point and then dragging the point into the trash can. All of the points on the plot 
www.pearsonglobaleditions 


can be removed by simply clicking inside the trash can. The range of values for 


ee the horizontal axis can be specified by inputting lower and upper limits and then 
clicking UPDATE. 
\ trash | 
2 4 6 8 
Lower Limit: | 1 Upper Limit: | 9 Update | 


Step 1 Specify a lower limit. 

Step 2 Specify an upper limit. 

Step 3 Add 15 points to the plot. 

Step 4 Remove all of the points from the plot. 


DRAW CONCLUSIONS 


Ee 1. Specify the lower limit to be 10 and the upper limit to be 20. Plot 10 points 
that have a mean of about 15 and a standard deviation of about 3. Write the 
estimates of the values of the points. Plot a point with a value of 15. What 
happens to the mean and standard deviation? Plot a point with a value of 20. 

What happens to the mean and standard deviation? 


APPLET 


2. Specify the lower limit to be 30 and the upper limit to be 40. How can you 
plot eight points so that the points have the greatest possible standard 
deviation? Use the applet to plot the set of points and then use the formula 
for standard deviation to confirm the value given in the applet. How can 
you plot eight points so that the points have the least possible standard 
deviation? Explain. 


122 CHAPTER 2. Descriptive Statistics 


Business Size 


The numbers of employees at businesses can vary. A business can have anywhere from a 
single employee to more than 1000 employees. The data shown below are the numbers of 
manufacturing businesses for nine states in a recent year. (Source: U.S. Census Bureau) 


Number of 
manufacturing 

State businesses 
California 38,293 
Illinois 13,531 
Indiana 8,036 
Michigan 12,361 
New York 16,076 
Ohio 14,208 
Pennsylvania 13,684 
Texas 19,681 
Wisconsin 8,858 


Number of Manufacturing Businesses by Number of Employees 


State 1-4 5-9 10-19 20-49 50-99 
California 15,320 | 7,074 5,862 | 5,494 | 2,276 
Illinois 4,683 2,234 2,103 2,165 1,123 
Indiana 2,225 | 1,319 | 1,276 | 1,403 797 
Michigan 4,055 2,103 2,008 | 2,044 974 
New York 7,048 2,810 2,342 2,134 885 
Ohio 4,274 | 2,469 | 2,281 | 2,495 | 1,233 
Pennsylvania 4,505 | 2,292 | 2,185 2,335 1,125 
Texas 7,019 3,409 2,994 3,078 | 1,501 
Wisconsin 2,657 1,372 1,342 1,520 889 


Use the information given in the above tables. 


1. Employees Which state has the greatest 
number of manufacturing employees? Explain 
your reasoning. 


2. Mean Business Size Estimate the mean 
number of employees at a manufacturing 
business for each state. Use 1500 as the midpoint 
for “1000+.” 


3. Employees Which state has the greatest 
number of employees per manufacturing 
business? Explain your reasoning. 


100-249 250-499 500-999 = 1000+ 


1, 


1, 


4. 


609 433 144 81 
852 241 96 34 
640 229 99 48 
800 254 76 47 
587 17 75 24 
982 311 112 51 
860 268 79 35 
114 358 145 63 
725 227 97 29 
Standard Deviation Estimate the standard 


deviation for the number of employees at a 
manufacturing business for each state. Use 1500 
as the midpoint for “1000+.” 


Standard Deviation Which state has the 
greatest standard deviation? Explain your 
reasoning. 


Distribution Describe the distribution of 
the number of employees at manufacturing 
businesses for each state. 


Case Study 123 


124 CHAPTER 2. Descriptive Statistics 


2.5 


What You Should Learn Quartiles m Percentiles and Other Fractiles m The Standard Score 


» How to find the first, second, : 
and third quartiles of a data set, Quartiles 
how to find the interquartile 
range of a data set, and how to 
represent a data set graphically 
using a box-and-whisker plot 


In this section, you will learn how to use fractiles to specify the position of a data 
entry within a data set. Fractiles are numbers that partition, or divide, an ordered 
data set into equal parts (each part has the same number of data entries). For 
instance, the median is a fractile because it divides an ordered data set into two 


~ How to interpret other fractiles 
equal parts. 


such as percentiles, and how 
to find percentiles for a specific 


data entry DEFINITION 


~ How to find and interpret the 


standardiscoreeeeare) The three quartiles, Q;, Qj, and Q3, divide an ordered data set into four 


equal parts. About one-quarter of the data fall on or below the first quartile 
Q,. About one-half of the data fall on or below the second quartile Q, (the 
second quartile is the same as the median of the data set). About three- 
quarters of the data fall on or below the third quartile Q3. 


Finding the Quartiles of a Data Set 


Each year in the U.S., automobile commuters waste fuel due to traffic 
congestion. The amounts (in gallons per year) of fuel wasted by commuters in 
the 15 largest U.S. urban areas are listed. (Large urban areas have populations 
of at least 3 million.) Find the first, second, and third quartiles of the data set. 
What do you observe? (Source: Based on 2015 Urban Mobility Scorecard) 


20 30 29 22 25 29 25 24 35 23 25 11 33 28 35 


SOLUTION 


First, order the data set and find the median Q). The first quartile Q, is the 
median of the data entries to the left of Q). The third quartile Q3 is the median 
of the data entries to the right of Q>. 


Data entries to Data entries to 
the left of Q, the right of Q, 
f ey y 4 a “5 


11 20 22 23 24 25 25 25 28 29 29 30 33 35 35 


Q; Q, Q3 


Interpretation In about one-quarter of the large urban areas, auto commuters 
waste 23 gallons of fuel or less, about one-half waste 25 gallons or less, and 
about three-quarters waste 30 gallons or less. 


TRY IT YOURSELF 1 


Find the first, second, and third quartiles for the points scored by the 51 winning 
teams using the data set listed on page 61. What do you observe? 
Answer: Page A33 


SECTION 2.5 Measures of Position 125 


Using Technology to Find Quartiles 


The tuition costs (in thousands of dollars) for 25 liberal arts colleges are listed. 
Use technology to find the first, second, and third quartiles. What do you 
observe? (Source: U.S. News & World Report) 


50 52 51 49 52 51 25 41 47 36 30 44 40 
35 40 45 34 33 23 34 27 16 18 18 35 


SOLUTION 


Minitab and the TI-84 Plus each have features that calculate quartiles. Try 
using this technology to find the first, second, and third quartiles of the tuition 
data. From the displays, you can see that 


O; = 28.5, O. = 36, and O; = 48. 


Tech Tip 


Note that you may 

get results that differ 
slightly when comparing 
results obtained by 
different technology 
tools. For instance, in 
Example 2, the first quartile, as 
determined by Minitab and the 
TI-84 Plus, is 28.5, whereas the 


MINITAB 


Descriptive Statistics: Tuition 


mseull using Excel is 20 (see below) Variable N Mean SE Mean StDev Minimum 
7 Tuition 25 37.04 2.27 11.36 16.00 
Variable Q1—— Median Q3 Maximum 
Tuition 28.50 36.00 48.00 52.00 

cr TI-84 PLUS STATCRUNCH 

7 ee B Summary statistics: 

2 | 52 Quartile.inc[(A1:A25, 1 ®n=e5 : 

3/51 30 minX=16 Eon Q1 Median Q3 

4| 49 Q,=28.5 Tuition 30 36 | 47 

5 | 52| Quartile.inc(A1:A25,2)| Med=36 

6) 51 36 

7| 95 Q3=48 

8 | 41 | Quartile.inc(A1:A25,3)| maxX=52 

Ee 47 47 
(10); 36) Interpretation About one-quarter of these colleges charge tuition of $28,500 
11) 30 or less; about one-half charge $36,000 or less; and about three-quarters charge 
V2) 44 $48,000 or less. 

13; 40 

— ie TRY IT YOURSELF 2 

16| 45 The tuition costs (in thousands of dollars) for 25 universities are listed. 
(17| 34] Use technology to find the first, second, and third quartiles. What do you 
18| 33 observe? (Source: U.S. News & World Report) 

ae 23 
120; 34 44 30 38 23 20 29 19 44 29 17 45 39 29 

_ ae 18 43 45 39 24 44 26 34 20 35 30 36 

22| 16 ‘A sail dl 
93/ 138 nswer: Page A33 
24| 18 

25/ 35 The median (the second quartile) is a measure of central tendency based 


on position. A measure of variation that is based on position is the interquartile 
range. The interquartile range tells you the spread of the middle half of the data, 
as shown in the next definition. 


126 


CHAPTER 2 _ Descriptive Statistics 


DEFINITION 


The interquartile range (IQR) of a data set is a measure of variation that 
gives the range of the middle portion (about half) of the data. The IOR is the 


difference between the third and first quartiles. 


IOR = 03 — 01 


In Section 2.3, an outlier was described as a data entry that is far removed 


from the other entries in the data set. One way to identify outliers is to use the 
interquartile range. 


GUIDELINES 


Using the Interquartile Range to Identify Outliers 

1. Find the first (Q, ) and third (Q,) quartiles of the data set. 

2. Find the interquartile range: IQR = Q3 — Qj. 

3. Multiply IOR by 1.5: 1.5(1QR). 

4, Subtract 1.5(IQR) from Q;. Any data entry less than Q; — 1.5(I1OR) is 


an outlier. 


. Add 1.5(IOR) to Q3. Any data entry greater than Q3 + 1.5(IQR) is an 
outlier. 


Using the Interquartile Range to Identify an Outlier 
Find the interquartile range of the data set in Example 1. Are there any outliers? 


SOLUTION 
From Example 1, you know that Q,; = 23 and Q; = 30. So, the interquartile 
range is IQR = Q; — Q, = 30 — 23 = 7. To identify any outliers, first note 
that 1.5(1QR) = 1.5(7) = 10.5. There is a data entry, 11, that is less than 
Q, — 1.5(1QR) = 23 — 10.5 Subtract 1.5([OR) from Q,. 
= 125 A data entry less than 12.5 is an outlier. 


but there are no data entries greater than 


Q; + 1.5(1QR) = 30 + 10.5 Add 1.5(IOR) from Qs. 
= 40.5. A data entry greater than 40.5 is an outlier. 


So, 11 is an outlier. 


Interpretation In large urban areas, the amount of fuel wasted by auto 
commuters in the middle of the data set varies by at most 10.5 gallons. Notice 
that the outlier, 11, does not affect the IQR. 


TRY IT YOURSELF 3 


Find the interquartile range for the points scored by the 51 winning teams 
listed on page 61. Are there any outliers? 
Answer: Page A33 


Another important application of quartiles is to represent data sets using 


box-and-whisker plots. A box-and-whisker plot (or boxplot) is an exploratory 
data analysis tool that highlights the important features of a data set. To graph a 
box-and-whisker plot, you must know the values shown at the top of the next page. 


SECTION 2.5 Measures of Position 127 


(ES 1. The minimum entry 2. The first quartile Q; 
oe) 3. The median QO, 4. The third quartile Q; 


ef) Picturing 
the World 


Since 1970, there have been 

2845 fatalities in the United States 
attributed to lightning strikes. The 
box-and-whisker plot summarizes 
the fatalities for each year since 
1970. (Source: National Weather Service) 


5. The maximum entry 


These five numbers are called the five-number summary of the data set. 


GUIDELINES 
Drawing a Box-and-Whisker Plot 


1. Find the five-number summary of the data set. 
. Construct a horizontal scale that spans the range of the data. 


2 
3. Plot the five numbers above the horizontal scale. 
4 


Lightning Fatalities 
” 645 °77 . Draw a box above the horizontal scale from Q, to Q3 and draw a vertical 
=a / line in the box at Q). 
23 124 . Draw whiskers from the box to the minimum and maximum entries. 
SSS See 


Box 
Whisker Whisker 


T 
20 40 60 80 100 120 


Fatalities per year since 1970 


About how many fatalities are 
represented by the right whisker? 
There were 27 lightning fatalities 
in 2015. Into what quartile does 
this number of fatalities fall? 


Minimum 7 Maximum 
entry Q} Median, Q Q3 entry 


See Minitab and TI-84 Plus 
steps on pages 146 and 147. 


Drawing a Box-and-Whisker Plot 


Draw a box-and-whisker plot that represents the data set in Example 1. What 
do you observe? 


SOLUTION Here is the five-number summary of the data set. 
Minmum = 11 Q,; =23 Q,=25 Q;=30 #£Maximum = 35 
Using these five numbers, you can construct the box-and-whisker plot shown. 


Gallons of Fuel Wasted Per Year 


* Study Tip {J 
For data sets that have WW 33°25 30 35 
outliers, you can represent ~« | | = 
5 10 15 20 25 30 35 40 


them graphically using a 
modified box-and-whisker 
plot. A modified 
box-and-whisker plot is a 
box-and-whisker plot that uses 
symbols (such as an asterisk 
or a point) to indicate outliers. 
The horizontal line of a modified 
box-and-whisker plot extends as 
far as the minimum data entry that 
iS not an outlier and the maximum 
data entry that is not an outlier. For 
instance, on pages 146 and 147 
Minitab and the TI-84 Plus were used 
to draw modified box-and-whisker 
plots that represent the data set in 
Example 1. Compare these results 
with the one in Example 4. 


Interpretation The box represents about half of the data, which means about 
50% of the data entries are between 23 and 30. The left whisker represents 
about one-quarter of the data, so about 25% of the data entries are less 
than 23. The right whisker represents about one-quarter of the data, so about 
25% of the data entries are greater than 30. Also, the length of the left whisker 
is much longer than the right one. This indicates that the data set has a possible 
outlier to the left. (You already know from Example 3 that the data entry of 
11 is an outlier). 


TRY IT YOURSELF 4 


Draw a box-and-whisker plot that represents the points scored by the 
51 winning teams listed on page 61. What do you observe? 
Answer: Page A33 


You can use a box-and-whisker plot to determine the shape of a distribution. 
Notice that the box-and-whisker plot in Example 4 represents a distribution that 
is skewed left. 


128 CHAPTER 2. Descriptive Statistics 


Percentiles and Other Fractiles 


In addition to using quartiles to specify a measure of position, you can also use 
percentiles and deciles. Here is a summary of these common fractiles. 


Study Tip 
Notice that the 25th 


percentile is the same Fractiles Summary Symbols 
as Q,; the 50th percentile is the ; — - 
same as Qs, or the median; and the Quartiles Divide a data set into 4 equal parts. Q;, Q2, Q3 
75th percentile is the same as Q3. Deciles Divide a data set into 10 equal parts. D,, D>, D3, ..., Do 
Percentiles | Divide a data set into 100 equal parts. | P;, Po, P3,..., Poo 


Percentiles are often used in education and health-related fields to indicate 
how one individual compares with others in a group. Percentiles can also be 
used to identify unusually high or unusually low values. For instance, children’s 
growth measurements are often expressed in percentiles. Measurements in the 
95th percentile and above are unusually high, while those in the Sth percentile 
and below are unusually low. 


Study Tip 


Be sure you understand 
what a percentile means. 
For instance, the weight 
of a six-month-old infant 
is at the 78th percentile. 
This means the infant 


Interpreting Percentiles 


The ogive at the right represents the SAT Scores 
cumulative frequency distribution for 
SAT scores of college-bound students 
ina recent year. What score represents 


100 
90 
80 ---- +725 2-2-5755 


. the 80th percentile? (Source: The 70 
weighs the same as or more than Colleve Roard) 2 6 
78% of all sixmonth-old infants. It ° 5 sq 
does not mean that the infant weighs SOLUTION 2 4 


78% of some ideal weight. 30 
20 


10 


From the ogive, you can see that the 
80th percentile corresponds to a score 
of 1250. 


Interpretation This means 


Ss) a 
a ee a a A I a Oa a 


T 
that 400 600 800 1000 1200 1400 1600 


approximately 80% of the students neo 
had an SAT score of 1250 or less. 
TRY IT YOURSELF 5 
Points Scored by The points scored by the 51 winning teams in the Super Bowl (see page 61) are 
Super Bowl Winner represented in the ogive at the left. What score represents the 10th percentile? 


How should you interpret this? 
Answer: Page A33 


In Example 5, you used an ogive to approximate a data entry that corresponds 
to a percentile. You can also use an ogive to approximate a percentile that 
corresponds to a data entry. Another way to find a percentile is to use a formula. 


Percentile 


DEFINITION 


To find the percentile that corresponds to a specific data entry x, use the 
formula 


number of data entries less than x 


100 


Percentile of x = : 
total number of data entries 


and then round to the nearest whole number. 


SECTION 2.5 Measures of Position 129 


Finding a Percentile 
For the data set in Example 2, find the percentile that corresponds to $34,000. 


SOLUTION 


Recall that the tuition costs are in thousands of dollars, so $34,000 is the data 
entry 34. Begin by ordering the data. 


16 18 18 23 25 27 30 33 34 34 35 35 36 
40 40 41 44 45 47 49 50 51 51 52 52 
There are 8 data entries less than 34 and the total number of data entries is 25. 


number of data entries less than 34 


Percentile of 34 100 


total number of entries 


8 
= —-100 
25 


= 32 


The tuition cost of $34,000 corresponds to the 32nd percentile. 

Interpretation The tuition cost of $34,000 is greater than 32% of the other 
tuition costs. 

TRY IT YOURSELF 6 


For the data set in Try It Yourself 2, find the percentile that corresponds to 
$26,000, which is the data entry 26. 
Answer: Page A33 


The Standard Score 


When you know the mean and standard deviation of a data set, you can measure 
the position of an entry in the data set with a standard score, or z-score. 


DEFINITION 


The standard score, or z-score, represents the number of standard deviations 
a value x lies from the mean yp. To find the z-score for a value, use the formula 
Value - Mean x7 yp 


z= — 
Standard deviation o 


A z-score can be negative, positive, or zero. When z is negative, the 
corresponding x-value is less than the mean. When z is positive, the corresponding 
x-value is greater than the mean. For z = 0, the corresponding x-value is equal 
to the mean. A z-score can be used to identify an unusual value of a data set that 
is approximately bell-shaped. 

When a distribution is approximately bell-shaped, you know from the 
Empirical Rule that about 95% of the data lie within 2 standard deviations of 
the mean. So, when this distribution’s values are transformed to z-scores, about 


Very unusual scores 


Unusual scores 


Usual scores 95% of the z-scores should fall between —2 and 2. A z-score outside of this range 
Fi oe a will occur about 5% of the time and would be considered unusual. So, according 
a @ 1 os = to the Empirical Rule, a z-score less than —3 or greater than 3 would be very 


z-score unusual, with such a score occurring about 0.3% of the time. 


130 


CHAPTER 2 _ Descriptive Statistics 


Finding z-Scores 

The mean speed of vehicles along a stretch of highway is 56 miles per hour 
with a standard deviation of 4 miles per hour. You measure the speeds of three 
cars traveling along this stretch of highway as 62 miles per hour, 47 miles per 
hour, and 56 miles per hour. Find the z-score that corresponds to each speed. 
Assume the distribution of the speeds is approximately bell-shaped. 


SOLUTION The z-score that corresponds to each speed is calculated below. 
x = 62 mph x = 47 mph x = 56 mph 


62 — 56 47 — 56 56 — 56 
z a 1.5 Zz mn 2.25 Zz a 0 

Interpretation From the z-scores, you can conclude that a speed of 62 miles 
per hour is 1.5 standard deviations above the mean; a speed of 47 miles per 
hour is 2.25 standard deviations below the mean; and a speed of 56 miles per 
hour is equal to the mean. The car traveling 47 miles per hour is said to be 
traveling unusually slow, because its speed corresponds to a z-score of —2.25. 


TRY IT YOURSELF 7 

The monthly utility bills in a city have a mean of $70 and a standard deviation 

of $8. Find the z-scores that correspond to utility bills of $60, $71, and $92. 

Assume the distribution of the utility bills is approximately bell-shaped. 
Answer: Page A33 


EXAMPLE 8 


Comparing z-Scores from Different Data Sets 


The table shows the mean heights and standard deviations for a population 
of men and a population of women. Compare the z-scores for a 6-foot-tall 
man and a 6-foot-tall woman. Assume the distributions of the heights are 
approximately bell-shaped. 


Men’s Women’s 
heights heights 


pw = 69.9in. |) w = 64.3 in. 


o = 3.0 in. o = 2.6in. 


SOLUTION Note that 6 feet = 72 inches. Find the z-score for each height. 


z-score for 6-foot-tall man z-score for 6-foot-tall woman 


_=—H#_ 2-699 _ 9, xp _ 2-643 _ 
-! vo 3.0 ‘ o 2.6 


3.0 


Interpretation The z-score for the 6-foot-tall man is within 1 standard 
deviation of the mean (69.9 inches). This is among the typical heights for a 
man. The z-score for the 6-foot-tall woman is about 3 standard deviations from 
the mean (64.3 inches). This is an unusual height for a woman. 


TRY IT YOURSELF 8 


Use the information in Example 8 to compare the z-scores for a 5-foot-tall man 
and a 5-foot-tall woman. 
Answer: Page A33 


SECTION 2.5 Measures of Position 131 


2.5 [ X E A hk | NN [ iN For Extra Help: MyLab Statistics 


Building Basic Skills and Vocabulary 


1. The length of a guest lecturer’s talk represents the third quartile for talks in 
a guest lecture series. Make an observation about the length of the talk. 


2. A motorcycle’s fuel efficiency represents the ninth decile of vehicles in its 
class. Make an observation about the motorcycle’s fuel efficiency. 


3. A student’s score on the Fundamentals of Engineering exam is in the 89th 
percentile. Make an observation about the student’s exam score. 


4. A student’s IQ score is in the 91st percentile on the Weschler Adult 
Intelligence Scale. Make an observation about the student’s IQ score. 


5. Explain how to identify outliers using the interquartile range. 


6. Describe the relationship between quartiles and percentiles. 


True or False? Jn Exercises 7-10, determine whether the statement is true or 
false. If it is false, rewrite it as a true statement. 


7. About one-quarter of a data set falls below Q). 
8. The second quartile is the mean of an ordered data set. 
9. An outlier is any number above Q; or below Q,. 


10. It is impossible to have a z-score of 0. 


Using and Interpreting Concepts 


Finding Quartiles, Interquartile Range, and Outliers Jn Exercises 11 
and 12, (a) find the quartiles, (b) find the interquartile range, and (c) identify any 
outliers. 


11. 40 42 48 35 45 65 46 48 41 39 47 48 46 43 49 


(ames 31 48 39 40 37 49 46 34 32 30 
44 41 42 49 49 35 38 37 41 35 


Graphical Analysis In Exercises 13 and 14, use the box-and-whisker plot to 
identify the five-number summary. 


i — ——— oe 
/ \ 
0 2 2 8 10 500 580 605 630 720 
SS SS <r e 
012 3 4 5 6 7 8 9 10 11 500 550 600 650 700 


Drawing aBox-and-WhiskerPlot /n Exercises 15-18, (a) find the five-number 


summary, and (b) draw a box-and-whisker plot that represents the data set. 

15.55 65 69 64 52 75 79 45 48 64 63 51 59 56 52 

16. 230 240 252 228 220 262 238 228 250 260 232 242 

Be 17.4775 29 76 8 5 8 415 28 76 6 9 

BG 1%2 713 1289 9 25 473 75 4 
2359 5639 3 49 8 823 9 5 


132 


CHAPTER 2 _ Descriptive Statistics 


Graphical Analysis Jn Exercises 19-22, use the box-and-whisker plot to 
determine whether the shape of the distribution represented is symmetric, skewed 
left, skewed right, or none of these. Justify your answer. 


~ 


Se eS eS SS SS 


30 40 50 60 70 80 90 100 110 100 200 300 400 500 600 
Using Technology to Find Quartiles and Draw Graphs In Exercises 
23-26, use technology to draw a box-and-whisker plot that represents the data set. 


Be 23. Studying The numbers of extra classes taken per week by a sample 
of 32 students 


5 8 
1-9 


(oe) 
~ 
lon 
nn 
— 
ie) 
looms 


1 3 

3.2 

7% 24.Leaves The numbers of leaves availed by a sample of 20 executives in 
a recent year 


43 5679 11 4 5 3 

9 6 15 264 7 11 5 9 

7% 25. Hours worked The numbers of working hours of a sample of 30 
employees in a month 


160 182 195 196 174 185 135 169 168 154 
196 210 199 187 164 152 158 161 143 237 
131 211 238 132 147 195 184 164 145 191 


7% 26. Annual Profits The annual profits (in thousands of dollars) of a 
sample of 27 companies listed on a stock exchange 


12.86 51.11 13.84 15.96 23.81 45.11 63.22 29.13 13.12 
23.07 2.11 28.02 28.04 2.11 1.02 2.01 13.08 21.01 
12.09 18.04 16.12 1811 9.11 22.01 11.04 16.14 22.04 


27. Studying Refer to the data set in Exercise 23 and the box-and-whisker plot 
you drew that represents the data set. 


(a) About 25% of the students took no more than how many extra classes 
per week? 


(b) What percent of the students took less than three extra classes per week? 


(c) Yourandomly select one student from the sample. What is the likelihood 
that the student took more than 2 extra classes per week? Write your 
answer as a percent. 


28. Annual profits Refer to the data set in Exercise 26 and the box-and- 
whisker plot you drew that represents the data set. 
(a) About 50% of the companies made less than what amount of annual profits? 
(b) What percent of the companies made profits of more than $12.09 thousand? 


(c) What percent of the companies made profits between $12.09 thousand 
and $23.81 thousand? 

(d) Yourandomly select one company from the sample. Whatis the likelihood 
that the company made an annual profit more than $23.81 thousand? 
Write your answer as a percent. 


Applied Statistics 
Test Scores 


48 53 58 63 68 73 78 


Score (out of 80) 
B Cc 


FIGURE FOR EXERCISE 41 


Physics Test Scores 


17, 20-23 a 29 
fscore (out of 30) 
A B €< 


FIGURE FOR EXERCISE 42 


SECTION 2.5 Measures of Position 133 


Interpreting Percentiles = /n Exercises 29-32, use the ogive, which represents 
the cumulative frequency distribution for quantitative reasoning scores on 
the Graduate Record Examination in a recent range of years. (Adapted from 
Educational Testing Service) 


Quantitative Reasoning Scores 


A 
100 +- 


Percentile 
wn 
oOo 
J 


134 138) 142) 146) 150s «154 158) 162s: 166—S «170 


Score 


29. What score represents the 70th percentile? How should you interpret this? 
30. Which score represents the 40th percentile? How should you interpret this? 
31. What percentile is a score of 140? How should you interpret this? 


32. What percentile is a score of 170? How should you interpret this? 


Be Finding a Percentile Jn Exercises 33-36, use the data set, which 
represents the ages of 30 executives. 


43 57 65 47 57 41 56 53 61 54 
56 50 66 56 50 61 47 40 S50 43 
54 41 48 45 28 35 38 43 42 44 


33. Find the percentile that corresponds to an age of 47 years old. 
34. Find the percentile that corresponds to an age of 57 years old. 
35. Which ages are below the 75th percentile? 
36. Which ages are above the 25th percentile? 


7 Finding and Interpreting Percentiles Jn Exercises 37-40, use the 
data set, which represents wait times (in minutes) for various services at a 
state’s Department of Motor Vehicles locations. 


6 10 1 22 23 10672 1 6 6 2 4 14 15 16 4 
19 3 19 26 5 347 6 10 9 10 20 18 3 20 10 13 
144 11 14 17 4 27:4 8 4 3 26 18 21 1 3 3 ~=5 °5 


37. Draw an ogive to show corresponding percentiles for the data. 
38. Which wait time represents the 50th percentile? How would you interpret this? 
39. Find the percentile that corresponds to a wait time of 20 minutes. 


40. Which wait times are between the 25th and 75th percentiles? 


Graphical Analysis Jn Exercises 41 and 42, the midpoints A, B, and C are 
marked on the histogram at the left. Match them with the indicated z-scores. Which 
z-scores, if any, would be considered unusual? 

41. z=0,z =2.14,z = -1.43 

42. z = 0.77, z = 154, z = -1.54 


134 


CHAPTER 2 _ Descriptive Statistics 


Finding z-Scores The distribution of the ages of the winners of the Tour de 
France from 1903 to 2016 is approximately bell-shaped. The mean age is 27.9 years, 
with a standard deviation of 3.3 years. In Exercises 43-48, use the corresponding 
z-score to determine whether the age is unusual. Explain your reasoning. (Source: 
Le Tour de France) 


Winner Year Age 
43. Christopher Froome 2016 31 
44, Jan Ullrich 1997 24 
45. Antonin Magne 1931 27 
46. Firmin Lambot 1922 36 
47. Henri Cornet 1904 20 
48. Christopher Froome 2013 28 


49. Life Spans of Lady Bugs The life spans of a species of lady bugs have a 
bell-shaped distribution, with a mean of 1000 days and a standard deviation 
of 100 days. 

(a) The life spans of three randomly selected lady bugs are 1250 days, 
1175 days, and 950 days. Find the z-score that corresponds to each life 
span. Determine whether any of these life spans are unusual. 

(b) The life spans of three randomly selected lady bugs are 1150 days, 
910 days, and 845 days. Using the Empirical Rule, find the percentile 
that corresponds to each life span. 

50. Life Spans of Bearings A brand of bearings has a mean life span of 15,000 
cycles, with a standard deviation of 1,250 cycles. Assume the life spans of the 
bearings have a bell-shaped distribution. 

(a) The life spans of three randomly selected bearings are 13,500 cycles, 
17,000 cycles, and 18,500 cycles. Find the z-score that corresponds to 
each life span. Determine whether any of these life spans are unusual. 

(b) The life spans of three randomly selected bearings are 12,500 cycles, 
14,750 cycles, and 19,000 cycles. Using the Empirical Rule, find the 
percentile that corresponds to each life span. 


Comparing z-Scores from Different Data Sets The table shows 
population statistics for the ages of Best Actor and Best Supporting Actor winners 
at the Academy Awards from 1929 to 2016. The distributions of the ages are 
approximately bell-shaped. In Exercises 51-54, compare the z-scores for the actors. 


Best actor | Best supporting actor 
bw =~ 43.7 yr Bw =~ 50.4 yr 
o =~ 8.7 yr o ~ 13.8 yr 


51. Best Actor 1984: Robert Duvall, Age: 53 
Best Supporting Actor 1984: Jack Nicholson, Age: 46 


52. Best Actor 2005: Jamie Foxx, Age: 37 
Best Supporting Actor 2005: Morgan Freeman, Age: 67 


53. Best Actor 1970: John Wayne, Age: 62 
Best Supporting Actor 1970: Gig Young, Age: 56 


54. Best Actor 1982: Henry Fonda, Age: 76 
Best Supporting Actor 1982: John Gielgud, Age: 77 


SECTION 2.5 Measures of Position 135 


Extending Concepts 


Midquartile Another measure of position is called the midquartile. You can 
find the midquartile of a data set by using the formula below. 


+ 
Midquartile = mee 


In Exercises 55 and 56, find the midquartile of the data set. 

55.5 7 1 2 3 10 8 7 5 3 

56. 23 36 47 33 34 40 39 24 32 22 38 41 

57. Song Lengths Side-by-side box-and-whisker plots can be used to compare 
two or more different data sets. Each box-and-whisker plot is drawn on the 


same number line to compare the data sets more easily. The lengths (in 
seconds) of songs played at two different concerts are shown. 


Concert 1 7 i 
177 200 210 220 240 


Concet2 + ; 
200 224 275 288 390 


i it i 
T t T t T t T 
125 150 «175200, 225,250 275, 300, 325, 350 375 400 


Song length (in seconds) 


(a) Describe the shape of each distribution. Which concert has less variation 
in song lengths? 

(b) Which distribution is more likely to have outliers? Explain. 

(c) Which concert do you think has a standard deviation of 16.3? Explain. 

(d) Can you determine which concert lasted longer? Explain. 


eB 58. Credit Card Purchases The credit card purchases (rounded to the 
nearest dollar) over the last three months for you and a friend are listed. 


You 60 95 102 110 130 130 162 200 215 120 124 28 
58 40 102 105 141 160 130 210 145 90 46 76 


Friend 100 125 132 90 85 75 140 160 180 190 160 105 
145 150 151 82 78 115 170 158 140 130 165 125 


Use technology to draw side-by-side box-and-whisker plots that 
represent the data sets. Then describe the shapes of the distributions. 


Modified Box-and-Whisker Plot Jn Exercises 59-62, (a) identify any 
outliers and (b) draw a modified box-and-whisker plot that represents the data set. 
Use asterisks (*) to identify outliers. 


59.16 9 11 12 8 10 12 13 11 10 24 9 2 15 7 

60. 75 78 80 75 62 72 74 75 80 95 76 72 

61. 47 29 59 83 46 1 46 23 52 53 35 37 49 

62. 36 38 47 50 53 54 19 27 30 47 48 50 56 60 90 62 

63. Project Find a real-life data set and use the techniques of Chapter 2, 


including graphs and numerical quantities, to discuss the center, variation, 
and shape of the data set. Describe any patterns. 


AND | Statistics in the Real World 


Uses 


Descriptive statistics help you see trends or patterns in a set of raw data. A 
good description of a data set consists of (1) a measure of the center of the 
data, (2) a measure of the variability (or spread) of the data, and (3) the 
shape (or distribution) of the data. When you read reports, news items, or 
advertisements prepared by other people, you are rarely given the raw data used 
for a study. Instead, you see graphs, measures of central tendency, and measures 
of variability. To be a discerning reader, you need to understand the terms and 
techniques of descriptive statistics. 


Procter & Gamble’s Net Profit Abuses 


ara Knowing how statistics are calculated can help you analyze 
© al: questionable statistics. For instance, you are interviewing for a 
as i il sales position and the company reports that the average yearly 
es” commission earned by the five people in its sales force is $60,000. 
6 Z a This is a misleading statement if it is based on four commissions 
Zz = 0.5--—— of $25,000 and one of $200,000. The median would more 
* 10.0-+-— accurately describe the yearly commission, but the company 
+++}; + + used the mean because it is a greater amount. 
PHOD AOE AOE 2012 2018 201s 201s 2015 Statistical graphs can also be misleading. Compare the two 
= time series charts at the left, which show the net profits for the 
Procter & Gamble’s Net Profit Procter & Gamble Corporation from 2009 through 2016. The 
n data are the same for each chart. The first time series chart, 
eae however, has a cropped vertical axis, which makes it appear that 
& z reall ee SS the net profit decreased greatly from 2009 to 2010, from 2011 to 
23 . al 2012, and from 2014 to 2016, and increased greatly from 2010 to 
6 z Pal 2011 and from 2012 to 2014. In the second time series chart, the 
Z 2 at scale on the vertical axis begins at zero. This time series chart 
5 oot correctly shows that the net profit changed modestly during this 
\—+—_+—_+—_+—_+—_+—_+— +> time period. (Source: Procter & Gamble Corporation) 
2009 2010 2011 2012 2013 2014 2015 2016 
Year 
Ethics 


Mark Twain helped popularize the saying, “There are three kinds of lies: lies, 
damned lies, and statistics.” In short, even the most accurate statistics can be 
used to support studies or statements that are incorrect. Unscrupulous people 
can use misleading statistics to “prove” their point. Being informed about how 
statistics are calculated and questioning the data are ways to avoid being misled. 


EXERCISES 


1. Use the Internet or some other resource to find an example of a graph that 
might lead to incorrect conclusions. 


2. You are publishing an article that discusses how drinking red wine can help 
prevent heart disease. Because drinking red wine might help people at risk for 
heart disease, you include a graph that exaggerates the effects of drinking red 
wine and preventing heart disease. Do you think it is ethical to publish this 
graph? Explain. 


136 CHAPTER 2. Descriptive Statistics 


Chapter Summary 


What Did You Learn? 


Section 2.1 


» How to construct a frequency distribution including limits, midpoints, relative 
frequencies, cumulative frequencies, and boundaries 


» How to construct frequency histograms, frequency polygons, relative 
frequency histograms, and ogives 


Section 2.2 


» How to graph and interpret quantitative data sets using stem-and-leaf plots 
and dot plots 


Vv 


How to graph and interpret qualitative data sets using pie charts and 
Pareto charts 


» How to graph and interpret paired data sets using scatter plots and time 
series charts 


Section 2.3 
» How to find the mean, median, and mode of a population and of a sample 


» How to find a weighted mean of a data set, and how to estimate the sample 
mean of grouped data 


» How to describe the shape of a distribution as symmetric, uniform, or 
skewed, and how to compare the mean and median for each 


Section 2.4 


» How to find the range of a data set, and how to find the variance and 
standard deviation of a population and of a sample 


» How to use the Empirical Rule and Chebychev's Theorem to interpret 
standard deviation 


» How to estimate the sample standard deviation for grouped data 


v 


How to use the coefficient of variation to compare variation in different 
data sets 


Section 2.5 


» How to find the first, second, and third quartiles of a data set, how to find the 
interquartile range of a data set, and how to represent a data set graphically 
using a box-and-whisker plot 


» How to interpret other fractiles such as percentiles, and how to find 
percentiles for a specific data entry 


Vv 


How to find and interpret the standard score (z-score) 


Chapter Summary 
Review 
Example(s) Exercises 
1,2 1 
3-7 2-6 
1-3 7,8 
4,5 9, 10 
6,7 11, 12 
1-6 13, 14 
7,8 15-18 
19-24 
1-4 25-28 
5-7 29-32 
8,9 33, 34 
10 35, 36 
1-4 37-42 
5, 6 43, 44 
7,8 45-48 


137 


138 


11.95 
11.93 
11.99 
11.94 
11.98 


CHAPTER 2 _ Descriptive Statistics 


Volumes (in ounces) 


11.91 
12.00 
11.94 
11.92 
11.92 


11.86 
11.94 
11.89 
11.98 
11.95 


11.94 
12.10 
12.01 
11.88 
11.93 


12.00 
11.95 
11.99 
11.94 
12.04 


TABLE FOR EXERCISES 3 AND 4 


Review Exercises 


Section 2.1 


7 In Exercises 1 and 2, use the data set, which represents the overall average 
class sizes for 20 national universities. (Adapted from Public University Honors) 


37 34 42 44 39 40 41 51 49 31 
55 26 31 40 30 27 36 43 49 35 


1. Construct a frequency distribution for the data set using five classes. Include 
class limits, midpoints, boundaries, frequencies, relative frequencies, and 
cumulative frequencies. 


2. Construct a relative frequency histogram using the frequency distribution in 
Exercise 1. Then determine which class has the greatest relative frequency 
and which has the least relative frequency. 


7 In Exercises 3 and 4, use the data set shown in the table at the left, which 
represents the actual liquid volumes (in ounces) in 25 twelve-ounce cans. 


3. Construct a frequency histogram for the data set using seven classes. 


4. Construct a relative frequency histogram for the data set using seven classes. 


7 In Exercises 5 and 6, use the data set, which represents the numbers of rooms 
reserved during one night’s business at a sample of hotels. 


153 104 118 166 89 104 100 79 93 96 116 
94 140 84 81 96 108 111 87 126 101 111 
122 108 126 93 108 87 103 95 129 93 124 


5. Construct a frequency distribution for the data set with six classes and draw 
a frequency polygon. 


6. Construct an ogive for the data set using six classes. 


Section 2.2 


a] In Exercises 7 and 8, use the data set, which represents the pollution indices 
for 24 U.S. cities. (Adapted from Numbeo) 


22 41 46 50 38 57 65 49 33 28 53 32 
41 23 38 65 28 36 63 54 39 43 56 39 


7. Use a stem-and-leaf plot to display the data set. Describe any patterns. 
8. Use a dot plot to display the data set. Describe any patterns. 
In Exercises 9 and 10, use the data set, which represents the results of a survey that 


asked U.S. full-time university and college students about their activities and time 
use on an average weekday. (Source: Bureau of Labor Statistics) 


: Leisure . Educational 
Response Sleeping and Sports Working Acuwities Other 
Time (in hours) 8.8 4.0 2.3 3.5 5.4 


9. Use a pie chart to display the data set. Describe any patterns. 


10. Use a Pareto chart to display the data set. Describe any patterns. 


11. 


Review Exercises 139 


The heights (in feet) and the numbers of stories of the ten tallest buildings 
in New York City are listed. Use a scatter plot to display the data. Describe 
any patterns. (Source: Emporis) 


Height 
(in feet) 


Stories 104 96 102 58 71 77 52 75 72 66 


1776 1398 | 1250 | 1200 1079 1046 1046) 1005. 975-952 


lad} 12. The U.S. real unemployment rates over a 12-year period are listed. 


Use a time series chart to display the data. Describe any patterns. 
(Source: U.S. Bureau of Labor Statistics) 


Year 2005 2006 2007 | 2008 2009 2010 
Rate 9.3% 84% | 84% | 9.2% | 14.2% | 16.7% 


Year 2011 2012 2013 2014 2015 2016 
Rate 16.2% | 15.2% | 145% | 12.7% | 11.3% | 9.9% 


Section 2.3 


In Exercises 13 and 14, find the mean, the median, and the mode of the data, if 
possible. If any measure cannot be found or does not represent the center of the 
data, explain why. 


13. 


14. 


15. 


16. 


17. 
18. 


19. 


20. 


The vertical jumps (in inches) of a sample of 10 college basketball players at 
the 2016 NBA Draft Combine (Source: DrafitExpress) 


33.0 35.5 37.55 31.0 28.0 29.5 21.0 26.0 24.0 29.5 


The responses of 1019 adults who were asked how much money they think they 
will spend on Christmas gifts in a recent year (Adapted from Gallup) 


$1000 or more: 306 $250—999: 336 Less than $250: 234 
Not sure: 51 None/do not celebrate Christmas: 92 


Six test scores are shown below. The first 4 test scores are 15% of the final 
grade, and the last two test scores are 20% of the final grade. Find the 
weighted mean of the test scores. 


80 70 84 93 89 78 


For the four test scores 96, 85, 91, and 86, the first 3 test scores are 20% of 
the final grade, and the last test score is 40% of the final grade. Find the 
weighted mean of the test scores. 


Estimate the mean of the frequency distribution you made in Exercise 1. 


The frequency distribution shows the numbers of magazine subscriptions 
per household for a sample of 60 households. Find the mean number of 
subscriptions per household. 


Number of magazines 0 1 2 3 4/55 | 6 
Frequency 13} 9 |19); 8/5 |2) 4 
Describe the shape of the distribution for the histogram you made in Exercise 3 


as symmetric, uniform, skewed left, skewed right, or none of these. 


Describe the shape of the distribution for the histogram you made in Exercise 4 
as symmetric, uniform, skewed left, skewed right, or none of these. 


140 


CHAPTER 2 _ Descriptive Statistics 


In Exercises 21 and 22, determine whether the approximate shape of the distribution 
in the histogram is symmetric, uniform, skewed left, skewed right, or none of these. 


21. 


2 6 10 14 18 22 26 30 34 2 6 10 14 18 22 26 30 34 


23. For the histogram in Exercise 21, which is greater, the mean or the median? 
Explain your reasoning. 


24. For the histogram in Exercise 22, which is greater, the mean or the median? 
Explain your reasoning. 


Section 2.4 


In Exercises 25 and 26, find the range, mean, variance, and standard deviation of 
the population data set. 


25. The weights (in lbs) of 14 newborn babies. 
7 5 12 12 6 9 11 4 768 7 10 9 


26. The ages of the Supreme Court justices as of December 22, 2016 (Source: 
Supreme Court of the United States) 


61 80 68 83 78 66 62 56 


In Exercises 27 and 28, find the range, mean, variance, and standard deviation of 
the sample data set. 


27. The insurance claims (in dollars) from an auto insurance company. 


1514 1473 1847 1746 1545 994 883 705 
612 1204 612 585 936 1122 816 


28. Annual household expenditures (in dollars) of a random sample of university 
professors: 


37,224 40,964 43,724 36,188 38,882 38,157 39,914 37,443 


In Exercises 29 and 30, use the Empirical Rule. 


29. The mean wages for a sample of employees in a company was $18.00 per day 
with a standard deviation of $2.50 per day. Between what two values do 95% 
of the data lie? (Assume the data set has a bell-shaped distribution.) 


30. The mean wages for a sample of employees in a company was $16.50 per day 
with a standard deviation of $1.50 per month. Estimate the percent of wages 
between $12.00 and $21.00 per day. (Assume the data set has a bell-shaped 
distribution.) 


31. In a certain examination, the mean score per student for 20 students is 75 
with a standard deviation of 8.5. Using Chebychev’s Theorem, determine at 
least how many of the students scored between 58 and 92. 


32. The mean duration of the 135 space shuttle flights was about 9.9 days, and 
the standard deviation was about 3.8 days. Using Chebychev’s Theorem, 
determine at least how many of the flights lasted between 2.3 days and 
17.5 days. (Source: NASA) 


Review Exercises 141 


33. From a random sample of households, the numbers of televisions are listed. 
Find the sample mean and the sample standard deviation of the data. 


Number of televisions 0 1 2 3 4 5 
Number of households 1 8 13 10 5 3 


34. From a random sample of airplanes, the numbers of defects found in their 
fuselages are listed. Find the sample mean and the sample standard deviation 
of the data. 


Number of defects 


0 2 3 5 6 
Number of airplanes 4 2 9 3 1 


Re 


1 
5 
In Exercises 35 and 36, find the coefficient of variation for each of the two data 
sets. Then compare the results. 


35. Sample dividends (in %) paid by two companies are listed: 


CompanyA 23 2.9 39 3.1 2.2 15 22 2.7 18 
CompanyB 2.8 25 29 36 2.9 33 33 2.8 1.6 


36. The heights (in inches) and weights (in lbs) of 8 students in a secondary 
school are listed. 


Heights 62 58 60 64 70 62 72 68 
Weights 92 80 82 106 136 96 146 138 


Section 2.5 


7 In Exercises 37-40, use the data set, which represents the model 2017 vehicles 
with the highest fuel economies (in miles per gallon) in the most popular 
classes. (Source: U.S. Environmental Protection Agency) 


35 35 112 34 124 35 107 46 136 56 58 119 50 
41 25 25 22 16 16 52 22 22 22 34 30 30 


37. Find the five-number summary of the data set. 

38. Find the interquartile range of the data set. 

39. Draw a box-and-whisker plot that represents the data set. 

40. About how many vehicles fall on or below the third quartile? 
41. Find the interquartile range of the data set from Exercise 13. 


42. The weights (in pounds) of the defensive players on a high school football 
team are shown below. Draw a box-and-whisker plot that represents the data 
set and describe the shape of the distribution. 


173, 145) 205 192 197 227 156 240 172 
208 185 190 167 212 228 190 184 195 


43. A worker’s income of $32 represents the 15th percentile of the incomes. 
What percentage of workers earns more than $32? 


44. As of December 2016, there were 721 adult contemporary radio stations 
in the United States. One station finds that 115 stations have a larger daily 
audience than it has. What percentile does this station come closest to in the 
daily audience rankings? (Source: Radio-Locator.com) 


The towing capacities (in pounds) of all the pickup trucks at a dealership have a 
bell-shaped distribution, with a mean of 11,830 pounds and a standard deviation 
of 2370 pounds. In Exercises 45—48, use the corresponding z-score to determine 
whether the towing capacity is unusual. Explain your reasoning. 


45. 16,500 pounds 46. 5500 pounds 47. 18,000 pounds 48. 11,300 pounds 


142 


CHAPTER 2 _ Descriptive Statistics 


Chapter Quiz 


Take this quiz as you would take a quiz in class. After you are done, check your 
work against the answers given in the back of the book. 


7 1. The data set represents the numbers of minutes a sample of 27 people 


exercise each week. 


108 139 120 123 120 132 123 131 131 
157 150 124 111 101 135 119 116 117 
127 128 139 119 118 114 127 142 130 


(a) Construct a frequency distribution for the data set using five classes. 
Include class limits, midpoints, boundaries, frequencies, relative 
frequencies, and cumulative frequencies. 


(b) Display the data using a frequency histogram and a frequency 
polygon on the same axes. 


(c) Display the data using a relative frequency histogram. 


(d) Describe the shape of the distribution as symmetric, uniform, skewed 
left, skewed right, or none of these. 


(e) Display the data using an ogive. 
(f) Display the data using a stem-and-leaf plot. Use one line per stem. 
(g) Display the data using a box-and-whisker plot. 


. Use frequency distribution formulas to approximate the sample mean and the 


sample standard deviation of the data set in Exercise 1. 


. The elements with known properties can be classified as metals (57 elements), 


metalloids (7 elements), halogens (5 elements), noble gases (6 elements), rare 
earth elements (30 elements), and other nonmetals (7 elements). Display the 
data using (a) a pie chart and (b) a Pareto chart. 


. Weekly salaries (in dollars) for a sample of construction workers are listed. 


1100 =720 «©1384 1124 1255 976 718 1316 
749 1062 1248 891 969 790 860 1100 


(a) Find the mean, median, and mode of the salaries. Which best describes 
a typical salary? 


(b) Find the range, variance, and standard deviation of the data set. 
(c) Find the coefficient of variation of the data set. 


. The mean price of new homes from a sample of houses is $180,000 with a 


standard deviation of $15,000. The data set has a bell-shaped distribution. 
Using the Empirical Rule, between what two prices do 95% of the houses fall? 


. Refer to the sample statistics from Exercise 5 and determine whether any of 


the house prices below are unusual. Explain your reasoning. 
(a) $225,000 (b) $80,000 (c) $200,000 (d) $147,000 


Be 7. The numbers of regular season wins for each Major League Baseball 


team in 2016 are listed. Display the data using a box-and-whisker plot. 
(Source: Major League Baseball) 


93 89 89 84 68 94 86 81 78 59 
95 86 84 74 69 95 87 79 71 68 
103 86 78 73 68 91 87 75 69 68 


Chapter Test 


Take this test as you would take a test in class. 


Number of 
Certification albums 
Diamond 6 
Multi-Platinum 26 
Platinum 42 
Gold 48 


TABLE FOR EXERCISE 5 


Chapter Test 143 


1. The overall averages of 12 students in a statistics class prior to taking the final 


exam are listed. 
67 72 88 73 99 85 81 87 63 94 68 87 
(a) Find the mean, median, and mode of the data set. Which best represents 
the center of the data? 
(b) Find the range, variance, and standard deviation of the sample data set. 
(c) Find the coefficient of variation of the data set. 


(d) Display the data in a stem-and-leaf plot. Use one line per stem. 


ad] 2. The data set represents the numbers of movies that a sample of 20 people 


watched in a year. 


121 148 94 142 170 88 221 106 18 67 
149 28 60 101 134 168 92 154 53 66 


(a) Construct a frequency distribution for the data set using six classes. 
Include class limits, midpoints, boundaries, frequencies, relative 
frequencies, and cumulative frequencies. 


(b) Display the data using a frequency histogram and a frequency 
polygon on the same axes. 


(c) Display the data using a relative frequency histogram. 


(d) Describe the shape of the distribution as symmetric, uniform, 
skewed left, skewed right, or none of these. 


(e) Display the data using an ogive. 


. Use frequency distribution formulas to estimate the sample mean and the 


sample standard deviation of the data set in Exercise 2. 


. For the data set in Exercise 2, find the percentile that corresponds to 


149 movies watched in a year. 


. The table lists the numbers of albums by The Beatles that received sales 


certifications. Display the data using (a) a pie chart and (b) a Pareto chart. 
(Source: Recording Industry Association of America) 


. The numbers of minutes it took 12 students in a statistics class to complete the 


final exam are listed. Use a scatter plot to display this data set and the data 
set in Exercise 1. The data sets are in the same order. Describe any patterns. 


61 85 67 48 54 61 59 80 67 55 88 8&4 


. The data set represents the ages of 15 college professors. 


46 51 60 58 37 65 40 55 30 68 28 62 56 42 59 


(a) Display the data in a box-and-whisker plot. 
(b) About what percent of the professors are over the age of 40? 


. The mean gestational length of a sample of 208 horses is 343.7 days, with a 


standard deviation of 10.4 days. The data set has a bell-shaped distribution. 


(a) Estimate the number of gestational lengths between 333.3 and 354.1 days. 
(b) Determine whether a gestational length of 318.4 days feet is unusual. 


Putting it all together 


REAL DECISIONS 


You are amember of your local apartment association. The association 
represents rental housing owners and managers who operate 
residential rental property throughout the greater metropolitan area. 
Recently, the association has received several complaints from tenants 
in a particular area of the city who feel that their monthly rental fees 
are much higher compared to other parts of the city. 


You want to investigate the rental fees. You gather the data shown The Monthly Rents (in dollars) Paid 
in the table at the right. Area A represents the area of the city where by 12 Randomly Selected Apartment 
tenants are unhappy about their monthly rents. The data represent Tenants in 4 Areas of Your City 
the monthly rents paid by a random sample of tenants in Area A reall iaeeahe UAntate. Aree DD 
and three other areas of similar size. Assume all the apartments | 
represented are approximately the same size with the same amenities. 1435 1265 1221 1044 

1249 1074 931 1234 
EXERCISES 1097 917 893 970 
5 970 1213 1317 827 
1. How Would You Do It: u7 949 1034 898 
(a) a ee seas ay bot aaa from renters who are 1122 839 1061 914 
ne a pepe eee .. ead : 1259 896 851 1387 
(b) Which statistical measure do you t ink wou d best represent the 1022 918 861 1166 
data sets for the four areas of the city? ion seek on is 
(c) Calculate the measure from part (b) for each of the four areas. 
1187 1218 1148 1029 
2. Displaying the Data 968 844 799 1131 
(a) What type of graph would you choose to display the data? 1097 791 872 1047 
Explain your reasoning. - —— 
(b) Construct the graph from part (a). 
(c) Based on your data displays, does it appear that the monthly 
rents in Area A are higher than the rents in the other areas of 
the city? Explain. 
3. Measuring the Data 

a) What other statistical measures in this chapter could you use ; 

G) P q Highest Monthly Rents 


to analyze the monthly rent data? 


(b) Calculate the measures from part (a). | For Two-Bedroom Apartments 
MEDIAN PER CITY 


(c) Compare the measures from part (b) with the graph you A 
isco, C $4 
es on, i; : ; 
me Hu)! 


constructed in Exercise 2. Do the measurements support 
your conclusion in Exercise 2? Explain. 
iT 
Jersey City, NJ $3200 
Washington, DC $3000 


4. Discussing the Data 


(a) Do you think the complaints in Area A are legitimate? How 
do you think they should be addressed? 

(b) What reasons might you give as to why the rents vary among 
different areas of the city? 


SSS 


(Source: Apartment List) 


144 CHAPTER 2. Descriptive Statistics 


TECHNOLOGY 


Parking Tickets 


According to data from the city of Toronto, Ontario, 
Canada, there were more than 180,000 parking 
infractions in the city for December 2015, with fines 
totaling over 8,500,000 Canadian dollars. 


The fines (in Canadian dollars) for a random 


sample of 105 parking infractions in Toronto, 
Ontario, Canada, for December 2015 are listed 
below. (Source: City of Toronto) 


30 30 30 30 40 60 = 40 


In Exercises 1-5, use technology. If possible, print your 
results. 


1. 


Find the sample mean of the data. 


2. Find the sample standard deviation of the data. 
3. 
4 


. Make a frequency distribution for the data. Use a class 


Find the five-number summary of the data. 


width of 15. 


. Draw a histogram for the data. Does the distribution 


appear to be bell-shaped? 


. What percent of the distribution lies within one 


standard deviation of the mean? Within two standard 
deviations of the mean? Within three standard 
deviations of the mean? 


Nd 


10. 


Extended solutions are given in the technology manuals that accompany this text. 
Technical instruction is provided for Minitab, Excel, and the TI-84 Plus. 


TI-84 PLUS 


Parking Infractions by Time of Day 


8:00 P.M.— 
11:59 P.M. 
9.4% 


(Source: City of Toronto) 


Parking Infractions by Day 
12,000 


10,000 
8,000 
6,000 


4,000 


Number of infractions 


2,000 } 


(Source: City of Toronto) 


The figures above show parking infractions in 
Toronto, Ontario, Canada, for December 2015 
by time of day and by day. 


. Do the results of Exercise 6 agree with the Empirical 


Rule? Explain. 


. Do the results of Exercise 6 agree with Chebychev’s 


Theorem? Explain. 


Use the frequency distribution in Exercise 4 to 
estimate the sample mean and sample standard 
deviation of the data. Do the formulas for grouped 
data give results that are as accurate as the individual 
entry formulas? Explain. 


Writing Do you think the mean or the median better 
represents the data? Explain your reasoning. 


Technology 145 


— Using Technology to Determine Descriptive Statistics 


Here are some Minitab and TI-84 Plus printouts for three examples in this chapter. 


Bar Chart... 
Pie Chart... 


STime Series Plot... 


Area Graph... 


Contour Plot... 
3D Scatterplot... 
3D Surface Plot... 


Display Descriptive Statistics... 


Store Descriptive Statistics... 
Graphical Summary... 


1-Sample Z... 
4-Sample t... 
2-Sample t... 
Paired t... 


Empirical CDF... 
Probability Distribution Plot ... 
Interval Plot... 


Individual Value Plot... 
Line Plot... 


See Example 7, page 83. 


AB 


Motor Vehicle Thefts 


Ss 
uo 
n 1 


o 
n 


Thefts (in millions) 


T T T T i T T ii T T h 
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 
Year 


See Example 3, page 107. 


Descriptive Statistics: Recovery times 


Variable N Mean SE Mean StDev Minimum 
Recovery times 12 7.500 0.544 1.883 4.000 
Variable Q1~=—Median Q3 Maximum 
Recovery times 6.250 7.500 9.000 10.000 
See Example 4, page 127. 
10 15 20 25 30 35 
Gallons of Fuel Wasted Per Year 


146 CHAPTER 2. Descriptive Statistics 


See Example 7, page 83. 


TI-84 PLUS 


STAT PLOTS 
Plot1...Off 
k=" L1 L2 os 
2: Plot2...Off 
[eae lees 
3: Plot3...Off 
[eateged alles 
4V PlotsOff 


TI-84 PLUS 


Plot2 Plot3 

Off 

Type: [-:* Mit dh 
He HEH |=" 

Xlist: L1 

Ylist: Le 

Mark: | +, 


TI-84 PLUS 


eye) MEMORY 


4* ZDecimal 
5: ZSquare 
6: ZStandard 
He AMG, 

8: ZInteger 
EH ZoomStat 
OW ZoomFit 


TI-84 PLUS 


See Example 3, page 107. 


TI-84 PLUS 


EDIT TESTS 
1-Var Stats 
2: 2-Var Stats 
3: Med-Med 

4: LinReglax+b) 
5: QuadReg 
6: CubicReg 
7 QuartReg 


TI-84 PLUS 


1-Var Stats 
List:L1 
FreqList: 
Calculate 


TI-84 PLUS 


Xa) 


>Yx=90 
>x°=714 


Sx=1.882937743 
ox=1.802775638 


Vn=12 


See Example 4, page 127. 


TI-84 PLUS 


MH Plot1...Off 
ke L1 L2 » 
2: Plot2...Off 
eet le) (ben 
3: Plot3...Off 
[seen Miser 
4V PlotsOff 


TI-84 PLUS 


Plote Plot3 


Off 

Type: Le! Le dh 
HAs ite | 

Xlist: L1 

Freqauad 

Mark: B+. 


TI-84 PLUS 


yaetew MEMORY 


4" ZDecimal 
5: ZSquare 
6: ZStandard 
Ue ZANG) 

8: Zinteger 
EH ZoomStat 
OW ZoomFit 


TI-84 PLUS 


Using Technology in Statistics 


147 


CHAPTERS 1&2 


In Exercises I and 2, identify the sampling technique used, and discuss potential 
sources of bias (if any). Explain. 


1. For quality assurance, every fortieth toothbrush is taken from each of four 
assembly lines and tested to make sure the bristles stay in the toothbrush. 


2. Using random digit dialing, researchers asked 1090 U.S. adults their level of 
education. 


3. In 2016, a worldwide study of workplace fraud found that initial detections 
of fraud resulted from a tip (39.1%), an internal audit (16.5%), management 
review (13.4%), detection by accident (5.6%), account reconciliation (5.5%), 
surveillance/monitoring (1.9%), confession (1.3%), or some other means 
(16.7%). Use a Pareto chart to organize the data. (Source: Association of 
Certified Fraud Examiners) 


In Exercises 4 and 5, determine whether the number is a parameter or a statistic. 
Explain your reasoning. 


4. In 2016, the median annual salary of a marketing account executive was 
$68,232. 


5. In a survey of 1002 U.S. adults, 88% said that fake news has caused a great 
deal of confusion or some confusion. (Source: Pew Research Center) 


6. The mean annual salary for a sample of electrical engineers is $86,500, with 
a standard deviation of $1500. The data set has a bell-shaped distribution. 


(a) Use the Empirical Rule to estimate the percent of electrical engineers 
whose annual salaries are between $83,500 and $89,500. 


(b) The salaries of three randomly selected electrical engineers are $93,500, 
$85,600, and $82,750. Find the z-score that corresponds to each salary. 
Determine whether any of these salaries are unusual. 


In Exercises 7 and 8, identify the population and the sample. 


7. A survey of 339 college and university admissions directors and enrollment 
officers found that 72% think their institution is losing potential applicants 
due to concerns about accumulating student loan debt. (Source: Gallup) 


8. A survey of 67,901 Americans ages 12 years or older found that 1.6% had 
used pain relievers for nonmedical purposes. (Source: Substance Abuse and 
Mental Health Services Administration) 


In Exercises 9 and 10, determine whether the study is an observational study or an 
experiment. Explain. 


9. To study the effect of using digital devices in the classroom on exam 
performance, researchers divided 726 undergraduate students into three 
groups, including a group that was allowed to use digital devices, a group that 
had restricted access to tablets, and a control group that was “technology- 
free.” (Source: Massachusetts Institute of Technology) 


10. In a study of 7847 children in grades 1 through 5, 15.5% have attention 
deficit hyperactivity disorder. (Source: Gallup) 


148 CHAPTER 2. Descriptive Statistics 


In Exercises 11 and 12, determine whether the data are qualitative or quantitative, 
and determine the level of measurement of the data set. 


11. The numbers of stolen bases during the 2016 season for Chicago Cubs 
players who stole at least one base are listed. (Source: Major League Baseball) 


as 3 2 13 12 © 2 i > Th il 


12. The six top-earning states in 2015 by median household income are listed. 
(Source: U.S. Census Bureau) 


1. New Hampshire 2. Alaska 3. Maryland 
4. Connecticut 5. Minnesota 6. New Jersey 


ac] 13. The numbers of tornadoes by state in 2016 are listed. (a) Draw a 
box-and-whisker plot that represents the data set and (b) describe 
the shape of the distribution. (Source: National Oceanic and Atmospheric 
Administration) 


ol OW 3 ws 7 25 @ @ ay wy 
O i Ss 20) 4 SS) sy sil 2 wy 
2 15 44 67 23 4 47 0 2 2 
3 ij ks 3 Sl sp #& © @ 

Ki it VT 3 © Ww © G tw il 


14. Five test scores are shown below. The first 4 test scores are 15% of the final 
grade, and the last test score is 40% of the final grade. Find the weighted 
mean of the test scores. 


85 92 84 89 91 
15. Tail lengths (in feet) for a sample of American alligators are listed. 
65 34 42 7.1 54 68 7.5 3.9 4.6 


(a) Find the mean, median, and mode of the tail lengths. Which best describes 
a typical American alligator tail length? Explain your reasoning. 


(b) Find the range, variance, and standard deviation of the data set. 


16. A study shows that life expectancies for Americans have increased or 
remained stable every year for the past five years. 


(a) Make an inference based on the results of the study. 


(b) What is wrong with this type of reasoning? 


7 In Exercises 17-19, use the data set, which represents the points scored by each 
player on the Montreal Canadiens in the 2015-2016 NHL season. (Source: 
National Hockey League) 


iy i @ © 2 is 9 5 I w 
> 2 OW 12 AO 10 so 40 2 @ 
0 2 0 0 2 44 1 2 19 64 
7 ie st WM a iB sil 2 W ws 


17. Construct a frequency distribution for the data set using eight classes. 
Include class limits, midpoints, boundaries, frequencies, relative frequencies, 
and cumulative frequencies. 


18. Describe the shape of the distribution. 


19. Construct a relative frequency histogram using the frequency distribution in 
Exercise 17. Then determine which class has the greatest relative frequency 
and which has the least relative frequency. 


Cumulative Review 149 


3. 
Basic Concepts of Probability 
and Counting 


Activity 


3.2 


Conditional Probability and the 
He iplication Rule 


The Addition Rule 
Activity 
Case Study 


3.4 


Additional Topics in Probability 
and Counting 


Uses and Abuses 
Real Statistics—Real Decisions 
Technology 


as eae” - 


i nee an entire Sociow- oe |e Ree Te © oe 


_— 


: ICT, 


Nine-time Olympic gold medalist Carl Lewis tested positive for banned stimulant use 


during the 1988 Olympic trials, but the results were overturned and Lewis was allowed 
to compete. 


150 


J Where You ve Been 


In Chapters 1 and 2, you learned how to collect and describe 
data. Once the data are collected and described, you can 
use the results to write summaries, draw conclusions, and 
make decisions. For instance, the International Olympic 
Committee has been testing Olympic athletes for the use of 
performance enhancing drugs (PEDs) since 1968. Like most 
lab tests, Olympic drug tests have a chance of producing 
false results. By collecting and analyzing data, you can 
determine the accuracy of these tests in order to minimize 
the chance of a false result. 


by, Where You re Going 


Over 10,000 athletes competed in the 2016 Summer 
Olympics in Rio, Brazil. It is estimated that around 29% 
of Olympic athletes are guilty of using PEDs. An athlete is 
not penalized for PED use unless the International Olympic 
Committee is almost 100% sure the athlete is guilty. Given 
the percentage of guilty athletes and the accuracy of the 
drug test, how sure can they be? 


In Chapter 3, you will learn how to determine the probability 
of an event. For instance, you can find the probability 
P(guilty|fail) that an athlete is guilty of using PEDs, given 
that the athlete failed the drug test. This type of probability 
is a conditional probability, which will be discussed in 
Section 3.2. 


Assume a drug test has been shown to be sensitive enough 
that an athlete using PEDs has a 90% chance of testing 
positive. In other words, the drug test has a sensitivity 
of 0.9. You can use this percentage and the estimated overall 
percentage of guilty athletes to calculate P( guilty|fail). 


The table below shows the probabilities P(guilty|fail) for 
drug tests with different sensitivities. 


Drug Test Sensitivity | P(guilty|fail) 
| 0.9 0.79 
0.95 0.89 
0.99 0.98 
0.999 0.998 


151 


152 CHAPTER 3. Probability 


EN Basic Concepts of Probability and Counting 


What You Should Learn 


» How to identify the sample 
space of a probability 
experiment and how to identify 
simple events 


» How to use the Fundamental 
Counting Principle to find the 
number of ways two or more 
events can occur 


v 


~ How to distinguish among 
classical probability, empirical 
probability, and subjective 
probability 


V 


How to find the probability of 
the complement of an event 


sd 


How to use a tree diagram and 
the Fundamental Counting 
Principle to find probabilities 


Study Tip 


Here is a simple example 
of the use of the terms 
probability experiment, 
sample space, event, 
and outcome. 


Probability Experiment: 
Roll a sixsided die. 
Sample Space: 
{1, 2, 3, 4, 5, 6} 
Event: 
Roll an even number, 
{2, 4, 6}. 
Outcome: 
Roll a 2, {2}. 


Probability Experiments m The Fundamental Counting Principle 
= Types of Probability ms Complementary Events = Probability Applications 


Probability Experiments 


When weather forecasters say that there is a 90% chance of rain or a physician 
says there is a 35% chance for a successful surgery, they are stating the 
likelihood, or probability, that a specific event will occur. Decisions such as 
“should you go golfing” or “should you proceed with surgery” are often based 
on these probabilities. In the preceding chapter, you learned about the role of 
the descriptive branch of statistics. The second branch, inferential statistics, 
has probability as its foundation, so it is necessary to learn about probability 
before proceeding. 


DEFINITION 


A probability experiment is an action, or trial, through which specific 
results (counts, measurements, or responses) are obtained. The result of a 


single trial in a probability experiment is an outcome. The set of all possible 
outcomes of a probability experiment is the sample space. An event is a 
subset of the sample space. It may consist of one or more outcomes. 


Identifying the Sample Space of a Probability Experiment 


A survey consists of asking people for their blood types (O, A, B, and AB), 
including whether they are Rh-positive or Rh-negative. Determine the number 
of outcomes and identify the sample space. 


SOLUTION 


There are four blood types: O, A, B, and AB. For each person, they are 
either Rh-positive or Rh-negative. A tree diagram gives a visual display of the 
outcomes of a probability experiment by using branches that originate from 
a starting point. It can be used to find the number of possible outcomes in a 
sample space as well as individual outcomes. 


Tree Diagram for Blood Types 


O » | « AB 
eee 
re hae + fp 4.4 


O+ O A+ A B+ B AB+ AB- 


From the tree diagram, you can see that the sample space has eight possible 
outcomes, which are listed below. 


{O+, O-, A+, A-, B+, B-, AB+, AB—} Sample space 


SURVEY 


Does your favorite 
team’s win or wie 
affect your mood! 


Check one response: 


a Yes 
oO No 
oO Not sure 


Source: Rasmussen 


Diagram for Coin and 
Die Experiment 
-1—H1 
+-ea—- H2 
--ey—- H3 
t- 4 — H4 
--e@y—- HS 
— 6 —> H6 


-@— 71 
| @—> T2 
+ @)—- T3 
lL 4—> T4 
-@— Ts 
6 —> T6 


SECTION 3.1 Basic Concepts of Probability and Counting 153 


TRY IT YOURSELF 1 


For each probability experiment, determine the number of outcomes and 
identify the sample space. 


1. A probability experiment consists of recording a response to the survey 
statement at the left and the gender of the respondent. 


2. A probability experiment consists of recording a response to the survey 
statement at the left and the age (18-34, 35-49, 50 and older) of the 
respondent. 


3. A probability experiment consists of recording a response to the survey 
statement at the left and the geographic location (Northeast, South, 
Midwest, West) of the respondent. 

Answer: Page A33 


In the rest of this chapter, you will learn how to calculate the probability or 
likelihood of an event. Events are often represented by uppercase letters, such as 
A, B, and C. An event that consists of a single outcome is called a simple event. 
For instance, consider a probability experiment that consists of tossing a coin and 
then rolling a six-sided die, as shown in the tree diagram at the left. The event 
“tossing heads and rolling a 3” is a simple event and can be represented as 


A = {H3}. Event A has one outcome, so it is a simple event. 


In contrast, the event “tossing heads and rolling an even number” is not simple 
because it consists of three possible outcomes and can be represented as 


B = {H2, H4, H6}. area = more than one outcome, so it is 


Identifying Simple Events 


Determine the number of outcomes in each event. Then decide whether each 
event is simple or not. Explain your reasoning. 


1. For quality control, you randomly select a machine part from a batch that 
has been manufactured that day. Event A is selecting a specific defective 
machine part. 


2. You roll a six-sided die. Event B is rolling at least a 4. 


SOLUTION 


1. Event A has only one outcome: choosing the specific defective machine 
part. So, the event is a simple event. 


2. Event B has three outcomes: rolling a 4, a 5, or a 6. Because the event has 
more than one outcome, it is not simple. 


TRY IT YOURSELF 2 


You ask for a student’s age at his or her last birthday. Determine the number 
of outcomes in each event. Then decide whether each event is simple or not. 
Explain your reasoning. 


1. Event C: The student’s age is between 18 and 23, inclusive. 


2. Event D: The student’s age is 20. 
Answer: Page A34 


154 CHAPTER 3 Probability 


The Fundamental Counting Principle 


In some cases, an event can occur in so many different ways that it is not practical 
to write out all the outcomes. When this occurs, you can rely on the Fundamental 
Counting Principle. The Fundamental Counting Principle can be used to find the 
number of ways two or more events can occur in sequence. 


The Fundamental Counting Principle 


If one event can occur in m ways and a second event can occur in n ways, then 
the number of ways the two events can occur in sequence is m+n. This rule 
can be extended to any number of events occurring in sequence. 


In words, the number of ways that events can occur in sequence is found by 
multiplying the number of ways one event can occur by the number of ways the 
other event(s) can occur. 


Using the Fundamental Counting Principle 


You are purchasing a new car. The possible manufacturers, car sizes, and 
colors are listed in the table. 


Manufacturer Car size Color 
Ford compact | white (W) 
GM midsize red (R) 

Honda black (B) 
green (G) 


How many different ways can you select one manufacturer, one car size, and 
one color? Use a tree diagram to check your result. 


SOLUTION 


There are three choices of manufacturers, two choices of car sizes, and 
four choices of colors. Using the Fundamental Counting Principle, you can 
determine that the number of ways to select one manufacturer, one car size, 
and one color is 


3°2°4 = 24 ways. 
Using a tree diagram, you can see why there are 24 options. 


Tree Diagram for Car Selections 


| Ford “\GM “> Honda 


() compact (midsize |“ compact (midsize (compact “mi 


idsize 
f T T 1 if T T 1 i T 1 T T 1 i T i 1 : T T 1 
@@eee0e0@8@ 0800000080 €©0000000 
WRBGWRBG WRBGWRBG WRBGWRBG 


TRY IT YOURSELF 3 


You add another manufacturer, Toyota, and another color, tan, to the choices 
in Example 3. How many different ways can you select one manufacturer, one 
car size, and one color? Use a tree diagram to check your result. 

Answer: Page A34 


TI-84 PLUS 
Leese? 


SECTION 3.1 Basic Concepts of Probability and Counting 155 


Using the Fundamental Counting Principle 


The access code for a car’s security system consists of four digits. Each digit can 
be any number from 0 through 9. 


Access Code 


1st 2nd 3rd 4th 
digit digit digit — digit 
How many access codes are possible when 
1. each digit can be used only once and not repeated? 
2. each digit can be repeated? 
3. each digit can be repeated but the first digit cannot be 0 or 1? 
SOLUTION 


1. Because each digit can be used only once, there are 10 choices for the first 
digit, 9 choices left for the second digit, 8 choices left for the third digit, 
and 7 choices left for the fourth digit. Using the Fundamental Counting 
Principle, you can conclude that there are 


10°9-°8-7 = 5040 
possible access codes. 


2. Because each digit can be repeated, there are 10 choices for each of the four 
digits. So, there are 


10-10-10-10 = 10* 
= 10,000 
possible access codes. 


3. Because the first digit cannot be 0 or 1, there are 8 choices for the first 
digit. Then there are 10 choices for each of the other three digits. So, 
there are 


8: 10-10-10 = 8000 
possible access codes. 


Remember that you can use technology to check your answers. For instance, at 
the left, a TI-84 Plus was used to check the results in Example 4. 


TRY IT YOURSELF 4 

How many license plates can you make when a license plate consists of 
1. six (out of 26) alphabetical letters, each of which can be repeated? 

2. six (out of 26) alphabetical letters, each of which cannot be repeated? 


3. six (out of 26) alphabetical letters, each of which can be repeated but the 
first letter cannot be A, B, C, or D? 


4. one digit (any number 1 through 9) and five (out of 26) alphabetical letters, 
each of which can be repeated? 
Answer: Page A34 


156 CHAPTER 3 Probability 


Types of Probability 


The method you will use to calculate a probability depends on the type of 

probability. There are three types of probability: classical probability, empirical 

probability, and subjective probability. The probability that event E will occur is 
& written as P( £) and is read as “the probability of event E.” 


DEFINITION 
| Study Tip 
Probabilities can be written 
as fractions, decimals, or 
percents. In Example 5, 
the probabilities are written 
as reduced fractions and 
decimals, with decimals 
rounded to three places when 
possible. For very small probabilities, 
round to the first nonzero digit. 
For example, 0.0000271 would be 
0.00003. In general, these round-off 
rules will be used throughout the 
text. (Note that some results may be 
rounded differently for accuracy.) 


Classical (or theoretical) probability is used when each outcome in a sample 
space is equally likely to occur. The classical probability for an event E is 


given by 


P(E) = 


Number of outcomes in event E 


Total number of outcomes in sample space’ 


Finding Classical Probabilities 
You roll a six-sided die. Find the probability of each event. 


1, Event A: rolling a3 


2. Event B: rolling a 7 


3. Event C: rolling a number less than 5 


SOLUTION 


When a six-sided die is rolled, the sample space consists of six outcomes: 
{1, 2,3, 4, 5,6}. Because each outcome in the sample space is equally likely to 
occur, you can use the formula for classical probability. 


1. There is one outcome in event A = {3}. So, 
1 
P(rolling a3) = Pic 0.167. Round to three decimal places. 


The probability of rolling a 3 is i, or about 0.167. 


2. Because 7 is not in the sample space, there are no outcomes in event B. So, 
‘ 0 ; . 
P(rolling a7) = 6 0. Event is not possible. 


Standard Deck of Playing Cards The probability of rolling a 7 is 0, so it is not possible for the event to occur. 


Hesne Wiaaenae Spades. Glare 3. There are four outcomes in event C = {1,2,3,4}. So, 


4 2 
AY A¢ aoa ae P(rolling a number less than 5) = — = a 0.667. 


KY K¢ K& Ko& 6 

v + & oe 
‘ ‘ ‘ : a 7 " The probability of rolling a number less than 5 is 3, or about 0.667. 
104 10 104 104 TRY IT YOURSELF 5 


9” 9¢ 94 9 & 
8 8 @ 8a 8 & 
7” 7¢ 74 7 & 
6” 6 4 64 6 & 1. Event D: Selecting the nine of clubs 
5 54 5a 5 & 
4” 4¢ 4a 4 
3 3 ¢ 34 3 & 3. Event F: Selecting a diamond, heart, club, or spade 

2” 2¢ 24 2 & Answer: Page A34 


You select a card from a standard deck of playing cards. Find the probability 
of each event. 


2. Event EF: Selecting a heart 


LAN 
Se 
ome 


eee) Picturing 
the World 


It seems that no matter how 
strange an event is, somebody 
wants to know the probability 
that it will occur. The table 

below lists the probabilities 

that some intriguing events will 
happen. (Adapted from Life: The Odds) 


Event Probability 
Being audited e 
by the IRS mo 
Writing a 
New York Times 0.005 
best seller 
Winning an 
Academy Award co 
Having your 0.5% 


identity stolen 
Spotting a UFO 0.0000003 


Which of these events is most 
likely to occur? Least likely? 


To explore this topic further, 


see Activity 3.1 on page 168. 


SECTION 3.1 Basic Concepts of Probability and Counting 157 


When an experiment is repeated many times, regular patterns are formed. 
These patterns make it possible to find empirical probability. Empirical 
probability can be used even when each outcome of an event is not equally likely 
to occur. 


DEFINITION 


Empirical (or statistical) probability is based on observations obtained from 
probability experiments. The empirical probability of an event E is the 
relative frequency of event EF. 


__ Frequency of event E 


P(E 
cae Total frequency 


Note thatn = Sf. 


Finding Empirical Probabilities 

A company is conducting an online survey of randomly selected U.S. adults 
to determine how they read books during the past year, if at all. So far, 
1490 adults have been surveyed. The pie chart shows the results. (Note that 
digital books include ebooks as well as audio books.) What is the probability 
that the next adult surveyed read only print books during the last year? 
(Pew Research Center, September 2016, “Book Reading 2016”) 


Book Reading by U.S. Adults 


Read both \ 
print and 
digital books 


y Read only 
digital books 


SOLUTION 


Note that the responses are not equally likely to occur and are based on 
observations. So, you cannot use the formula for classical probability, but 
you can use the formula for empirical probability. The event is a response of 
“read only print books.” The frequency of this event is 578. The total of the 
frequencies is 


n = 578 + 91 + 426 + 395 Add frequency of each response. 
= 1490. Total frequency 


The empirical probability that the response of the next adult is “read only print 
books” is 


578 
P(read only print books) = 1490 Find empirical probability. 
= 0.388. Round to three decimal places. 


TRY IT YOURSELF 6 


In Example 6, determine the probability that the next adult surveyed read only 
digital books during the last year. Answer: Page A34 


158 CHAPTER 3. Probability 


Ages 
18 to 22 
23 to 35 
36 to 49 
50 to 64 


65 and over 


Frequency, f 
156 
312, 
254 
195 
58 
Xf = 975 


Using a Frequency Distribution to Find Probabilities 


A company is conducting a phone survey of randomly selected individuals to 
determine the ages of social networking site users. So far, 975 social networking 
site users have been surveyed. The frequency distribution at the right shows 
the results. What is the probability that the next user surveyed is 23 to 35 years 
old? (Adapted from Pew Research Center) 


SOLUTION 


Because the responses are not equally likely to occur and are based on 
observations, use the formula for empirical probability. The event is a response 
of “23 to 35 years old.” The frequency of this event is 312. Because the total 
of the frequencies is 975, the empirical probability that the next user is 23 to 
35 years old is 

312 


P(age 23 to35) = 975 0.32. 


TRY IT YOURSELF 7 


Find the probability that the next user surveyed is 36 to 49 years old. 
Answer: Page A34 


As you increase the number of times a probability experiment is repeated, 
the empirical probability (relative frequency) of an event approaches the 
theoretical probability of the event. This is known as the law of large numbers. 


Law of Large Numbers 


As an experiment is repeated over and over, the empirical probability of an 
event approaches the theoretical (actual) probability of the event. 


As an example of this law, suppose you want to determine the probability 
of tossing a head with a fair coin. You toss the coin 10 times and get 3 heads, 
so you obtain an empirical probability of = Because you tossed the coin 
only a few times, your empirical probability is not representative of the 
theoretical probability, which is 5. The law of large numbers tells you that the 
empirical probability after tossing the coin several thousand times will be very 
close to the theoretical or actual probability. 

The scatter plot below shows the results of simulating a coin toss 150 times. 
Notice that, as the number of tosses increases, the probability of tossing a head 
gets closer and closer to the theoretical probability of 0.5. 


Probability of Tossing a Head 
A 


10+ 


Proportion that are heads 

ad 

Rin 

¥ 

e 

} 

i) 

iT 

q 

i) 

1 

i) 

bess il I 


T T = 
30 60 90 120.150 


Number of tosses 


SECTION 3.1 Basic Concepts of Probability and Counting 159 


The third type of probability is subjective probability. Subjective probabilities 
result from intuition, educated guesses, and estimates. For instance, given a 
patient’s health and extent of injuries, a doctor may feel that the patient has a 
90% chance of a full recovery. Or a business analyst may predict that the chance 
of the employees of a certain company going on strike is 0.25. 


EXAMPLE 8 


Classifying Types of Probability 

Classify each statement as an example of classical probability, empirical 
probability, or subjective probability. Explain your reasoning. 

1. The probability that you will get an A on your next test is 0.9. 


2. The probability that a voter chosen at random will be younger than 35 years 
old is 0.3. 


3. The probability of winning a 1000-ticket raffle with one ticket is a0 


SOLUTION 


1. This probability is most likely based on an educated guess. It is an example 
of subjective probability. 


2. This statement is most likely based on a survey of a sample of voters, so it 
is an example of empirical probability. 


3. Because you know the number of outcomes and each is equally likely, this 
is an example of classical probability. 


TRY IT YOURSELF 8 


Based on previous counts, the probability of a salmon successfully passing 
through a dam on the Columbia River is 0.85. Is this statement an example of 
classical probability, empirical probability, or subjective probability? (Source: 
Army Corps of Engineers) 

Answer: Page A34 


A probability cannot be negative or greater than 1, as stated in the rule 
below. 


Range of Probabilities Rule 


The probability of an event FE is between 0 and 1, inclusive. That is, 
0 = P(E) =1. 


When the probability of an event is 1, the event is certain to occur. When the 
probability of an event is 0, the event is impossible. A probability of 0.5 indicates 
that an event has an even chance of occurring or not occurring. 

The figure below shows the possible range of probabilities and their meanings. 


Impossible Unlikely Even chance Likely Certain 
f | 
0 0.25 0.5 0.75 1 


An event that occurs with a probability of 0.05 or less is typically considered 
unusual. Unusual events are highly unlikely to occur. Later in this course you will 
identify unusual events when studying inferential statistics. 


160 CHAPTER 3. Probability 


Sample Space 


The area of the rectangle represents 
the total probability of the sample 
space (1 = 100%). The area of the 
circle represents the probability of 
event E, and the area outside the 
circle represents the probability of 
the complement of event E. 


Complementary Events 


The sum of the probabilities of all outcomes in a sample space is 1 or 100%. An 
important result of this fact is that when you know the probability of an event E, 
you can find the probability of the complement of event E. 


DEFINITION 


The complement of event E is the set of all outcomes in a sample space that 


are not included in event E. The complement of event E is denoted by E’ and 
is read as “E prime.” The Venn diagram at the left illustrates the relationship 
between the sample space, event E, and its complement LE’. 


For instance, when you roll a die and let E be the event “the number is 
at least 5,” the complement of FE is the event “the number is less than 5.” In 
symbols, E = {5,6} and E’ = {1,2,3 4}. 

Using the definition of the complement of an event and the fact that the sum 
of the probabilities of all outcomes is 1, you can determine the formulas below. 

P(E) + P(E’) =1 
P(E) =1—- P(E’) 


P(E') =1- P(E) 


Finding the Probability of the Complement of an Event 


The frequency distribution from Example 7 is shown below. Find the 
probability of randomly selecting a social networking site user who is not 23 
to 35 years old. 


Ages Frequency, f 


18 to 22 156 
23 to 35 312 
36 to 49 254 
50 to 64 195 
65 and over 58 
Xf = 975 
SOLUTION 
From Example 7, you know that 
312 
P(age 23 to35) = O75 
= 0.32. 


So, the probability that a user is not 23 to 35 years old is 


: _ 312 663 
P(age is not 23 to35) = 1 975 > 975 0.68. 


TRY IT YOURSELF 9 


Use the frequency distribution in Example 7 to find the probability of 
randomly selecting a user who is not 18 to 22 years old. 
Answer: Page A34 


(Sy 
RABY 


Tree Diagram for Coin and 
Spinner Experiment 


-@5—- H1 
t- 2 —> H2 
+-@)—~ 3 
t- 4 —> H4 
[- 5 —> H5 
-—- H6 
+-@>—~ H7 
— 8 —> H8 
-@>— T1 
-ey—- T2 
--ey—- T3 
t+ 4—> T4 
+e TS 
| @—- Te 
+-@—- T7 
> TS 


SECTION 3.1 Basic Concepts of Probability and Counting 161 


Probability Applications 


Using a Tree Diagram 


A probability experiment consists of tossing a coin and spinning the spinner 
shown at the left. The spinner is equally likely to land on each number. Use a 
tree diagram to find the probability of each event. 


1. Event A: tossing a tail and spinning an odd number 


2. Event B: tossing a head or spinning a number greater than 3 


SOLUTION 


From the tree diagram at the left, you can see that there are 16 outcomes. The 
outcomes are equally likely to occur, so use the formula for classical probability. 


1. There are four outcomes in event A = {T1,T3, T5, T7}. So, 


4 1 
P(tossing a tail and spinning an odd number) = ie a 0.25. 


2. There are 13 outcomes in event B = {H1, H2, H3, H4, HS, H6, H7, H8, T4, 
T5, T6, T7, T8}. So, 


13 
P(tossing a head or spinning a number greater than 3) = ie 0.813. 


TRY IT YOURSELF 10 


Find the probability of tossing a tail and spinning a number less than 6. 
Answer: Page A34 


Using the Fundamental Counting Principle 


Your college identification number consists of eight digits. Each digit can be 
0 through 9 and each digit can be repeated. What is the probability of getting 
your college identification number when randomly generating eight digits? 


SOLUTION 


Because each digit can be repeated, there are 10 choices for each of the 8 digits. 
So, using the Fundamental Counting Principle, there are 


10-10-10-10-10-10-10+10 = 10° = 100,000,000 


possible identification numbers. But only one of those numbers corresponds to 
your college identification number. So, the probability of randomly generating 
8 digits and getting your college identification number is 


1 
100,000,000” or 0.00000001. 


TRY IT YOURSELF 11 


Your college identification number consists of nine digits. The first two digits 
of the number will be the last two digits of the year you are scheduled to 
graduate. The other digits can be any number from 0 through 9, and each digit 
can be repeated. What is the probability of getting your college identification 
number when randomly generating the other seven digits? 

Answer: Page A34 


162 CHAPTER 3. Probability 


3.1 EXERCISES 


For Extra Help: MyLab Statistics 


Building Basic Skills and Vocabulary 
1. What is the difference between an outcome and an event? 


2. Determine whether each number could represent the probability of an 
event. Explain your reasoning. 


(a) = (b) 333.3%  (c) 2.3. (d) —0.0004 (e)0 = (f) as 
. Explain why the statement is incorrect: The probability of rain is 150%. 


3 
4. When you use the Fundamental Counting Principle, what are you counting? 
5. Describe the law of large numbers in your own words. Give an example. 

6 


. List the three formulas that can be used to describe complementary events. 


True or False? Jn Exercises 7-10, determine whether the statement is true or 
false. If it is false, rewrite it as a true statement. 


7. You are taking a test that has true or false and multiple choice questions. 
The event “choosing false on a true or false question and choosing A or B 
on a multiple choice question” is a simple event. 


8. You toss a fair coin nine times and it lands tails up each time. The probability 
it will land heads up on the tenth toss is greater than 0.5. 


9. A probability of i indicates an unusual event. 


10. When an event is almost certain to happen, its complement will be an 
unusual event. 


Matching Probabilities in Exercises 11-16, match the event with its 
probability. 


(a) 0.95 (b) 0.005. (c) 0.25. (d) 0 (e) 0.375 (£):0.5 


11. A random number generator is used to select a number from 1 to 100. What 
is the probability of selecting the number 153? 


12. A random number generator is used to select a number from 1 to 100. What 
is the probability of selecting an even number? 


13. You randomly select a number from 0 to 9 and then randomly select a 
number from 0 to 19. What is the probability of selecting a 3 both times? 


14. A game show contestant must randomly select a door. One door doubles her 
money while the other three doors leave her with no winnings. What is the 
probability she selects the door that doubles her money? 


15. Five of the 100 digital video recorders (DVRs) in an inventory are known to 
be defective. What is the probability you randomly select a DVR that is not 
defective? 


16. You toss a coin four times. What is the probability of tossing tails exactly half 
of the time? 


Finding the Probability of the Complement of an Event Jn Exercises 
17-20, the probability that an event will happen is given. Find the probability that 
the event will not happen. 


17. P(E)= 3 18. P(E) =0.55 19. P(E) = 0.03.20. P(E) = 5 


SECTION 3.1 Basic Concepts of Probability and Counting 163 


Finding the Probability of an Event = /n Exercises 21-24, the probability that 
an event will not happen is given. Find the probability that the event will happen. 


21. P(E’) =0.95 22. P(E’) =0.13 23. P(E’) =} 24. P(E’) =H 


Using and Interpreting Concepts 


Identifying the Sample Space of a Probability Experiment Jn 
Exercises 25-32, identify the sample space of the probability experiment and 
determine the number of outcomes in the sample space. Draw a tree diagram when 
appropriate. 


25. Guessing the initial of a student’s middle name 
26. Guessing a student’s letter grade (A, B, C, D, F) ina class 
27. Drawing one card from a standard deck of cards 


28. Identifying a person’s eye color (blue, brown, green, hazel, gray, other) and 
hair color (blonde, black, brown, red, other). 


29. Tossing two coins 
30. Tossing three coins 
31. Rolling a pair of six-sided dice 


32. Rolling a six-sided die, tossing two coins, and then drawing one card from a 
hand of three cards 


Identifying Simple Events Jn Exercises 33-36, determine the number of 
outcomes in the event. Then decide whether the event is a simple event or not. 
Explain your reasoning. 


33. A spreadsheet is used to randomly generate a number from 1 to 2000. 
Event A is generating the number 253. 


34. A spreadsheet is used to randomly generate a number from 1 to 4000. 
Event B is generating a number less than 500. 


35. You randomly select one card from a standard deck of 52 playing cards. 
Event A is selecting a diamond. 


36. You randomly select one card from a standard deck of 52 playing cards. 
Event B is selecting the ace of spades. 


Using the Fundamental Counting Principle In Exercises 37-40, use the 
Fundamental Counting Principle. 


37. Menu A restaurant offers a $20 dinner special that lets you choose from 
8 appetizers, 11 entrées, and 7 desserts. How many different meals are 
available when you select an appetizer, an entrée, and a dessert? 


38. Tablet A tablet has 4 choices for an operating system, 3 choices for a screen 
size, 4 choices for a processor, 6 choices for memory size, and 3 choices for a 
battery. How many ways can you customize the tablet? 


39. Realty A realtor uses a lock box to store the keys to a house that is for sale. 
The access code for the lock box consists of five digits. The first digit cannot 
be zero or nine and the last digit must be a multiple of 3. How many different 
codes are available? 


40. Yes or No Quiz Assuming that no questions are left unanswered, in how 
many ways can a ten-question yes or no quiz be answered? 


164 CHAPTER 3 Probability 


Response 
None 
One 
Two 


Three or more 


TABLE FOR EXERCISES 47 AND 48 


Ages 
18 to 29 
30 to 44 
45 to 64 


65 and over 


TABLE FOR EX 


Ages 

0-14 
15-29 
30-44 
45-59 
60-74 


75 and over 


TABLE FOR EX 


Number of 
cars, f 


960 
1800 
920 
280 


Frequency, f 
(in millions) 
48.9 
53.9 
78.1 
46.0 


ERCISES 49-52 


Frequency, f 
173 


122 


ERCISES 59-62 


Finding Classical Probabilities —/n Exercises 41-46, a probability experiment 
consists of rolling a 12-sided die, numbered I to 12. Find the probability of the 
event. 


41. Event A: rolling a2 42. Event B: rolling a 10 

43. Event C: rolling a number less than 3 

44. Event D: rolling a number more than 11 

45. Event E: rolling a number divisible by 4 

46. Event F: rolling a number divisible by 4 

Finding Empirical Probabilities A polling organization is asking a sample 
of U.K. households how many cars they have. The frequency distribution at the left 


shows the results. In Exercises 47 and 48, use the frequency distribution. (Adapted 
from Statista) 


47. What is the probability that the next household asked does not have a car? 
48. What is the probability that the next household asked has one car? 


Using a Frequency Distribution to Find Probabilities Jn Exercises 
49-52, use the frequency distribution at the left, which shows the number of 
voting-age American citizens (in millions) by age, to find the probability that a 
citizen chosen at random is in the age range. (Source: U.S. Census Bureau) 


49. 18 to 29 years old 50. 30 to 44 years old 
51. 45 to 64 years old 52. 65 years old and older 


Classifying Types of Probability Jn Exercises 53-58, classify the statement 
as an example of classical probability, empirical probability, or subjective 
probability. Explain your reasoning. 


53. The probability that a randomly selected number from 1 to 50 is divisible by 
4 is 0.24. 


54. You think that a candidate’s probability of winning the next election is 
about 0.65. 


55. According to a survey, the probability that an adult chosen at random 
watches a movie every day is about 0.60. 


56. According to a survey, the probability that an adult chosen at random is in 
favor of a sprinkling ban is about 0.45. 


57. The probability that a randomly selected number from 1 to 100 is divisible 
by 6 is 0.16. 


58. You think that a football team’s probability of winning its next game is 
about 0.80. 


Finding the Probability of the Complement of an Event The age 
distribution of the residents of Kadoka, South Dakota, is shown at the left. In 
Exercises 59-62, find the probability of the event. (Adapted from U.S. Census Bureau) 


59. Event A: randomly choosing a resident who is not 15 to 29 years old 
60. Event B: randomly choosing a resident who is not 45 to 59 years old 
61. Event C: randomly choosing a resident who is not 14 years old or younger 


62. Event D: randomly choosing a resident who is not 75 years old or older 


FIGURE FOR EXERCISES 63-66 


2016 Presidential Election 
Voters from Virginia 


fluor, bean 
1.8 million voted 
Republican ede 
P another party 


FIGURE FOR EXERCISE 73 


All Registered Voters in Texas 


About About 
9.0 million 6.1 million did 
voted in the 2016} not vote in the 
presidential /2016 presidential 
election election 


FIGURE FOR EXERCISE 74 


Level of Education 


Number of employees 
w 
aN 
i) 


Doctoral 
Master’s 
Bachelor’s 
Associate’s 
High school 
diploma 
Other 


Highest level of education 


FIGURE FOR EXERCISES 75-78 


SECTION 3.1 Basic Concepts of Probability and Counting 165 


Using a Tree Diagram Jn Exercises 63-66, a probability experiment consists 
of rolling a six-sided die and spinning the spinner shown at the left. The spinner is 
equally likely to land on each color. Use a tree diagram to find the probability of 
the event. Then explain whether the event can be considered unusual. 


63. Event A: rolling a 5 and the spinner landing on blue 

64. Event B: rolling an odd number and the spinner landing on green 

65. Event C: rolling a number less than 6 and the spinner landing on yellow 

66. Event D: not rolling a number less than 6 and the spinner landing on yellow 

67. Access Code An access code consists of three digits. Each digit can be any 
number from 0 through 9, and each digit can be repeated. 


(a) What is the probability of randomly selecting the correct access code on 
the first try? 


(b) What is the probability of not selecting the correct access code on the 
first try? 


68. Access Code An access code consists of two characters. Each character can 
be any alphabet from A through Z, and each alphabet can be repeated. 


(a) What is the probability of randomly selecting the correct access code on 
the first try? 


(b) What is the probability of not selecting the correct access code on the 
first try? 


Wet or Dry? = Yow are planning a three-day trip to Seattle, Washington, in October. 
In Exercises 69-72, use the fact that on each day, it could either be sunny or rainy. 


69. What is the probability that it is sunny all three days? 
70. What is the probability that it rains all three days? 
71. What is the probability that it rains on exactly one day? 


72. What is the probability that it rains on at least one day? 


Graphical Analysis Jn Exercises 73 and 74, use the diagram at the left. 


73. What is the probability that a voter from Virginia chosen at random voted 
Republican in the 2016 presidential election? (Source: Virginia Department of 
Elections) 


74, What is the probability that a registered voter in Texas chosen at random did 
not vote in the 2016 presidential election? (Source: Texas Secretary of State) 


Using a Bar Graph to Find Probabilities Jn Exercises 75-78, use the bar 
graph at the left, which shows the highest level of education received by employees 
of a company. Find the probability that the highest level of education for an 


employee chosen at random is 
75. a doctorate. 76. an associate’s degree. 


77. a master’s degree. 78. a high school diploma. 


79. Unusual Events Can any of the events in Exercises 49-52 be considered 
unusual? Explain. 


80. Unusual Events Can any of the events in Exercises 75-78 be considered 
unusual? Explain. 


166 CHAPTER 3. Probability 


R WwW 


RW 


FIGURE FOR EXERCISE 81 


Workers (in thousands) by 
Industry for the U.S. 
Agriculture, 
forestry, fishing, 
and hunting 
2422 


Manufacturing 


quarrying, oil 
and gas 

extraction, and 

construction 
10,852 


FIGURE FOR EXERCISES 83-86 


6 


nH nr F&F W NY FR 


12234788 

0013333444555667777777788899999 
000001111222233333344444555666677788 8889999 
00000112223333444445556668899 
0001112223337788 


81. 


82. 


Genetics A Punnett square is a diagram that shows all possible gene 
combinations in a cross of parents whose genes are known. When two pink 
snapdragon flowers (RW) are crossed, there are four equally likely possible 
outcomes for the genetic makeup of the offspring: red (RR), pink (RW), 
pink (WR), and white (WW), as shown in the Punnett square at the left. 
When two pink snapdragons are crossed, what is the probability that the 
offspring will be (a) pink, (b) red, and (c) white? 


Genetics There are six basic types of coloring in registered collies: 
sable (SSmm), tricolor (ssmm), trifactored sable (Ssmm), blue merle 
(ssMm), sable merle (SSMm), and trifactored sable merle (SsMm). The 
Punnett square below shows the possible coloring of the offspring of 
a trifactored sable merle collie and a trifactored sable collie. What is 
the probability that the offspring will have the same coloring as one of 
its parents? 


Parents: Ssmm and SsMim 


SM Sm sM sm 
Sm SSMm SSmm_ SsMm | Ssmm 
Sm SSMm) SSmm _ SsMm | Ssmm 
sm SsMm § Ssmm |) ssMm | ssmm 
sm | SsMm | Ssmm | ssMm ssmm 


Using a Pie Chart to Find Probabilities In Exercises 83-86, use the pie 
chart at the left, which shows the number of workers (in thousands) by industry 
for the United States. (Source: U.S. Bureau of Labor Statistics) 


83. 


84. 


85. 


86. 


87. 


Find the probability that a worker chosen at random is employed in the 
services industry. 


Find the probability that a worker chosen at random is not employed in the 
services industry. 


Find the probability that a worker chosen at random is employed in the 
manufacturing industry. 


Find the probability that a worker chosen at random is not employed in the 
agriculture, forestry, fishing, and hunting industry. 


College Football A stem-and-leaf plot for the numbers of touchdowns 
allowed by all 128 NCAA Division I Football Bowl Subdivision teams in 
the 2016-2017 season is shown. Find the probability that a team chosen 
at random allowed (a) at least 51 touchdowns, (b) between 20 and 
30 touchdowns, inclusive, and (c) more than 63 touchdowns. Are any of 
these events unusual? Explain. (Source: National Collegiate Athletic Association) 


Key: 1|6 = 16 


SECTION 3.1 Basic Concepts of Probability and Counting 167 


88. Individual Stock Price An individual stock is selected at random from 
the portfolio represented by the box-and-whisker plot shown. Find the 
probability that the stock price is (a) less than $21, (b) between $21 and $50, 
and (c) $30 or more. 


Stock price (in dollars) 


Writing § Jn Exercises 89 and 90, write a statement that represents the complement 
of the probability. 


89. The probability of randomly choosing a cricket player who also played for 
his school team. (Assume that you are choosing from the population of all 
cricket players.) 


90. The probability of randomly choosing a smoker whose mother also smoked 
(Assume that you are choosing from the population of all smokers.) 


Extending Concepts 


Odds The chances of winning are often written in terms of odds rather than 
probabilities. The odds of winning is the ratio of the number of successful 
outcomes to the number of unsuccessful outcomes. The odds of losing is the ratio 
of the number of unsuccessful outcomes to the number of successful outcomes. 
For example, when the number of successful outcomes is 2 and the number 
of unsuccessful outcomes is 3, the odds of winning are 2:3 (read “2 to 3”). 
In Exercises 91-96, use this information about odds. 


91. A beverage company puts game pieces under the caps of its drinks and 
claims that one in six game pieces wins a prize. The official rules of the 
contest state that the odds of winning a prize are 1: 6. Is the claim “one in 
six game pieces wins a prize” correct? Explain your reasoning. 


92. The probability of winning an instant prize game is The odds of winning a 
different instant prize game are 1 : 10. You want the best chance of winning. 
Which game should you play? Explain your reasoning. 


93. The odds of an event occurring are 4:5. Find (a) the probability that the 
event will occur and (b) the probability that the event will not occur. 


94. A card is picked at random from a standard deck of 52 playing cards. Find 
the odds that it is a spade. 


95. A card is picked at random from a standard deck of 52 playing cards. Find 
the odds that it is not a spade. 


96. The odds of winning an event A are p : q. Show that the probability of event A 


is given by P(A) = . 
ee aa 


97. Rolling a Pair of Dice You roll a pair of six-sided dice and record the sum. 


(a) List all of the possible sums and determine the probability of rolling each sum. 

(b) Use technology to simulate rolling a pair of dice and record the sum 
100 times. Make a tally of the 100 sums and use these results to list the 
probability of rolling each sum. 

(c) Compare the probabilities in part (a) with the probabilities in part (b). 
Explain any similarities or differences. 


ACTIVITY 


Simulating the Stock Market 


APPLET 


You can find the interactive 
applet for this activity 
within MyLab Statistics or at 
www.pearsonglobaleditions 
.com. 


APPLET 


The simulating the stock market applet allows you to investigate the probability 
that the stock market will go up on any given day. The plot at the top left corner 
shows the probability associated with each outcome. In this case, the market 
has a 50% chance of going up on any given day. When SIMULATE is clicked, 
outcomes for n days are simulated. The results of the simulations are shown in 
the frequency plot. When the animate option is checked, the display will show 
each outcome dropping into the frequency plot as the simulation runs. The 
individual outcomes are shown in the text field at the far right of the applet. 
The center plot shows in red the cumulative proportion of times that the market 
went up. The green line in the plot reflects the theoretical probability of the 
market going up. As the experiment is conducted over and over, the cumulative 
proportion should converge to the theoretical probability. 


Probability 


1+ 
O47 
0.25 


Ot $—$—__———__+—_—_ 
Up Down 


Simulations: 


Frequency 
+ 


Simulate 


Step 1 Specify a value for n. 

Step 2 Click SIMULATE four times. 
Step 3. Click RESET. 

Step 4 Specify another value for n. 
Step 5 Click SIMULATE. 


DRAW CONCLUSIONS 


1. Run the simulation using n = 1 without clicking RESET. How many days did 
it take until there were three straight days on which the stock market went up? 
three straight days on which the stock market went down? 


2. Run the applet to simulate the stock market activity over the next 35 business 
days. Find the empirical probability that the market goes up on day 36. 


168 CHAPTER 3. Probability 


3.2 


What You Should Learn 


» How to find the probability of 
an event given that another 
event has occurred 

» How to distinguish between 
independent and dependent 
events 

~ How to use the Multiplication 
Rule to find the probability of 
two or more events occurring 
in sequence and to find 
conditional probabilities 


Have you ever been 
offended by something 
on social media? 


Gender Yes No Total 
Female 619 549 1168 
Male 532 576 1108 
Total 1151. 1125-2276 


Sample Space 


Gender Yes 
Female 619 
Male 532 
Total 1151 


Conditional Probability and the Multiplication Rule 


SECTION 3.2 Conditional Probability and the Multiplication Rule 169 


Conditional Probability m Independent and Dependent Events 
mu The Multiplication Rule 


Conditional Probability 


In this section, you will learn how to find the probability that two events occur in 
sequence. Before you can find this probability, however, you must know how to 
find conditional probabilities. 


DEFINITION 


A conditional probability is the probability of an event occurring, given that 


another event has already occurred. The conditional probability of event B 
occurring, given that event A has occurred, is denoted by P(B|A) and is read 
as “probability of B, given A.” 


Finding Conditional Probabilities 

1. Two cards are selected in sequence from a standard deck of 52 playing 
cards. Find the probability that the second card is a queen, given that the 
first card is a king. (Assume that the king is not replaced.) 


2. The table at the left shows the results of a survey in which 2276 social media 
users were asked whether they have ever been offended by something 
(posts, comments, or photos) they saw on social media. Find the probability 
that a user is male, given that the user was offended by something on social 
media. (Adapted from The Harris Poll) 


SOLUTION 


1. Because the first card is a king and is not replaced, the remaining deck has 
51 cards, 4 of which are queens. So, 


4 
P(B\|A) == = 0.078. 
(BIA) == ~ 007 
The probability that the second card is a queen, given that the first card is a 
king, is about 0.078. 


2. There are 1151 users who said they were offended by something on social 
media. So, the sample space consists of these 1151 users, as shown at the left. 
Of these, 532 are males. So, 


532 
P(B\|A) = —— = 0.462. 
ae) 1151 oe 
The probability that a user is male, given that the user was offended, is 


about 0.462. 


TRY IT YOURSELF 1 

Refer to the survey in the second part of Example 1. Find the probability that 
a user is female, given that the user was not offended by something on social 
media. Answer: Page A34 


170 CHAPTER 3. Probability 


LON 
OLS. 
Megrene 


eee) Picturing 
the World 


Truman Collins, a probability 

and statistics enthusiast, wrote a 
program that finds the probability 
of landing on each square of a 
Monopoly® board during a game. 
Collins explored various scenarios, 
including the effects of the Chance 
and Community Chest cards and 
the various ways of landing in or 
getting out of jail. Interestingly, 
Collins discovered that the 

length of each jail term affects 

the probabilities. (Note that the 
probabilities are rounded to more 
than three decimal places so that 
it is easier to see how going to jail 
affects the probabilities.) 


Probability Probability 
Monopoly given short given long 
square jailterm jail term 


Go 0.0310 0.0291 
Chance 0.0087 0.0082 
In Jail 0.0395 0.0946 
Free 
Parking 0.0288 0.0283 
Fark 0.0219 0.0206 
Place 
B&O RR 0.0307 0.0289 
Water 
Works 0.0281 0.0265 


Why do the probabilities depend 
on how long you stay in jail? 


Independent and Dependent Events 


In some experiments, one event does not affect the probability of another. 
For instance, when you roll a die and toss a coin, the outcome of the roll of 
the die does not affect the probability of the coin landing heads up. These 
two events are independent. The question of the independence of two or more 
events is important to researchers in fields such as marketing, medicine, and 
psychology. You can use conditional probabilities to determine whether events 
are independent. 


DEFINITION 


Two events are independent when the occurrence of one of the events does 
not affect the probability of the occurrence of the other event. Two events A 
and B are independent when 


P(B|A) = P(B) Occurrence of A does not affect probability of B 
or when 
P(A|B) = P(A). Occurrence of B does not affect probability of A 


Events that are not independent are dependent. 


To determine whether A and B are independent, first calculate P(B), the 
probability of event B. Then calculate P(B|A), the probability of B, given A. If 
the values are equal, then the events are independent. If P(B) # P(B|A), then 
A and B are dependent events. 


Classifying Events as Independent or Dependent 
Determine whether the events are independent or dependent. 


1. Selecting a king (A) from a standard deck of 52 playing cards, not replacing 
it, and then selecting a queen (B) from the deck 


2. Tossing a coin and getting a head (A), and then rolling a six-sided die and 
obtaining a 6 (B) 


3. Driving over 85 miles per hour (A), and then getting in a car accident (B) 


SOLUTION 


1. P(B) = Sand P(B|A) = 4. The occurrence of A changes the probability 
of the occurrence of B, so the events are dependent. 


2. P(B) =% and P(B|A) =% The occurrence of A does not change the 
probability of the occurrence of B, so the events are independent. 


3. Driving over 85 miles per hour increases the chances of getting in an 
accident, so these events are dependent. 


TRY IT YOURSELF 2 
Determine whether the events are independent or dependent. 


1. Smoking a pack of cigarettes per day (A) and developing emphysema, a 
chronic lung disease (B) 
2. Tossing a coin and getting a head (A), and then tossing the coin again and 
getting a tail (B) 
Answer: Page A34 


In words, to use the 
Multiplication Rule, 
1. find the probability that 

the first event occurs, 
2. find the probability that 

the second event 
occurs given that the first event has 
occurred, and 


3. multiply these two probabilities. 


Study Tip 


Recall from Section 3.1 that 
a probability of 0.05 or less 
is typically considered 
unusual. In the first part of 
Example 3, 0.006 < 0.05. 
This means that selecting 

a king and then a queen (without 

replacement) from a standard deck 

is an unusual event. 


SECTION 3.2 Conditional Probability and the Multiplication Rule 171 


The Multiplication Rule 


To find the probability of two events occurring in sequence, you can use the 
Multiplication Rule. 


The Multiplication Rule for the Probability of A and B 


The probability that two events A and B will occur in sequence is 
P(A and B) = P(A): P(BIA). 


Events A and B are dependent. 
If events A and B are independent, then the rule can be simplified to 
P(A and B) = P(A)-P(B). 


This simplified rule can be extended to any number of independent events. 


Events A and B are independent. 


Using the Multiplication Rule to Find Probabilities 


1. Two cards are selected, without replacing the first card, from a standard 
deck of 52 playing cards. Find the probability of selecting a king and then 
selecting a queen. 


2. A coin is tossed and a die is rolled. Find the probability of tossing a head 
and then rolling a 6. 


SOLUTION 
1. Because the first card is not replaced, the events are dependent. 


P(K and Q) = P(K)+P(Q|K) 


= 0.006 


So, the probability of selecting a king and then a queen without replacement 
is about 0.006. 


2. The events are independent. 


P(Hand6) = P(H)-P(6) 


= 0.083 
So, the probability of tossing a head and then rolling a 6 is about 0.083. 


TRY IT YOURSELF 3 
1. The probability that a salmon swims successfully through a dam is 0.85. Find 
the probability that two salmon swim successfully through the dam. 
2. Two cards are selected from a standard deck of 52 playing cards without 
replacement. Find the probability that they are both hearts. 
Answer: Page A34 


172 CHAPTER 3. Probability 


Using the Multiplication Rule to Find Probabilities 


For anterior cruciate ligament (ACL) reconstructive surgery, the probability 
that the surgery is successful is 0.95. (Source: The Orthopedic Center of St. Louis) 


1. Find the probability that three ACL surgeries are successful. 
2. Find the probability that none of the three ACL surgeries are successful. 
3. Find the probability that at least one of the three ACL surgeries is successful. 


SOLUTION 


1. The probability that each ACL surgery is successful is 0.95. The chance of 
success for one surgery is independent of the chances for the other surgeries. 


P(three surgeries are successful) = (0.95) (0.95) (0.95) ~ 0.857 
So, the probability that all three surgeries are successful is about 0.857. 


2. Because the probability of success for one surgery is 0.95, the probability of 
failure for one surgery is 1 — 0.95 = 0.05. 


P(none of the three are successful) = (0.05) (0.05) (0.05) ~ 0.0001 


So, the probability that none of the surgeries are successful is about 0.0001. Note 
that because 0.0001 is less than 0.05, this can be considered an unusual event. 


3. The phrase “at least one” means one or more. The complement to the 
event “at least one is successful” is the event “none are successful.” Use the 
complement found in part 2 to find the probability. (To avoid rounding in 
the second step below, use (0.05)(0.05)(0.05), not the rounded result.) 


P(at least one is successful) = 1 — P(none are successful ) 
= 1 — (0.05)(0.05)(0.05) 
0.9999. 


N 


So, the probability that at least one of the three surgeries is successful is 
about 0.9999. Note that this probability is not rounded to three decimal 
places because the result would be 1.000, which implies the event is certain. 
Even though it is highly likely that at least one of the three surgeries is 
successful, it is not a certain event. 


TRY IT YOURSELF 4 

The probability that a particular rotator cuff surgery is successful is 0.9. (Source: 
The Orthopedic Center of St. Louis) 

1. Find the probability that three rotator cuff surgeries are successful. 


2. Find the probability that none of the three rotator cuff surgeries are successful. 


3. Find the probability that at least one of the three rotator cuff surgeries 
is successful. 
Answer: Page A34 


In Example 4, you were asked to find a probability using the phrase “at 
least one.” Notice that it was easier to find the probability of its complement, 
“none,” and then subtract the probability of its complement from 1. In general, 
this probability can be written as 


P(at least one occurrence of event A) = 1 — P(no occurrence of event A). 


SECTION 3.2 Conditional Probability and the Multiplication Rule 173 


Using the Multiplication Rule to Find Probabilities 


In a recent year, there were 18,187 U.S. allopathic medical school seniors 
who applied to residency programs and submitted their residency program 
choices. Of these seniors, 17,057 were matched with residency positions, 
with about 79.2% getting one of their top three choices. Medical students 
rank the residency programs in their order of preference, and program 
directors in the United States rank the students. The term “match” refers 
to the process whereby a student’s preference list and a program director’s 
preference list overlap, resulting in the placement of the student in a residency 
position. (Source: National Resident Matching Program) 


1. Find the probability that a randomly selected senior was matched with a 
residency position and it was one of the senior’s top three choices. 


2. Find the probability that a randomly selected senior who was matched 
with a residency position did not get matched with one of the senior’s top 
three choices. 


3. Would it be unusual for a randomly selected senior to be matched with a 
residency position and that it was one of the senior’s top three choices? 
SOLUTION 


Let A = {matched with residency position} and B = {matched with one of 
top three choices}. So, 


_ 17,057 
18,187 


1. The events are dependent. 


P(A) and P(BI|A) = 0.792. 


17,057 
18,187 


So, the probability that a randomly selected senior was matched with one of 
the senior’s top three choices is about 0.743. 


P(A and B) = P(A)+P(B|A) = ( ) (0.792) ~ 0.743 


2. To find this probability, use the complement. 
P(B'|A) =1-— P(B|A) = 1 — 0.792 = 0.208 


So, the probability that a randomly selected senior was matched with 
a residency position that was not one of the senior’s top three choices 
is 0.208. 


3. It is not unusual because the probability of a senior being matched with a 
residency position that was one of the senior’s top three choices is about 
0.743, which is greater than 0.05. In fact, with a probability of 0.743, this 
event is likely to happen. 


TRY IT YOURSELF 5 


In a jury selection pool, 65% of the people are female. Of these 65%, one out 
of four works in a health field. 


1. Find the probability that a randomly selected person from the jury pool is 
female and works in a health field. Is this event unusual? 


2. Find the probability that a randomly selected person from the jury pool is 
female and does not work in a health field. Is this event unusual? 
Answer: Page A34 


174 CHAPTER 3 Probability 


3.2 EXERCISES 


For Extra Help: MyLab Statistics 


Building Basic Skills and Vocabulary 
1. What is the difference between independent and dependent events? 


2. Give an example of 
(a) two events that are independent. 
(b) two events that are dependent. 


3. What does the notation P(B|A) mean? 


4. Explain how to use the complement to find the probability of getting at least 
one item of a particular type. 


True or False? = Jn Exercises 5 and 6, determine whether the statement is true or 
false. If it is false, rewrite it as a true statement. 


5. If two events are independent, then P(A|B) = P(B). 
6. If events A and B are dependent, then P(A and B) = P(A)-:P(B). 


Using and Interpreting Concepts 


Finding Conditional Probabilities in Exercises 7 and 8, use the table to 
find each conditional probability. 


7. Business Degrees The table shows the numbers of male and female students 
in the United States who received bachelor’s degrees in business in a recent 
year. (Source: National Center for Educational Statistics) 


Business degrees Nonbusiness degrees Total 
Male 191,310 621,359 812,669 
Female 172,489 909,776 1,082,265 
Total 363,799 1,531,135 1,894,934 


(a) Find the probability that a randomly selected student is male, given that 
the student received a business degree. 


(b) Find the probability that a randomly selected student received a business 
degree, given that the student is female. 


8. Leisure Trips The table shows the results of a survey in which 300 managers 
and 300 executives aged 25 to 50 were asked if they go for annual leisure trips. 


Go for trips Do not gofortrips Total 


Managers 175 125 300 
Executives 150 150 300 
Total 325 275 600 


(a) Find the probability that a randomly selected employee goes for leisure 
trips, given that the employee is an executive. 

(b) Find the probability that a randomly selected employee is a manager, 
given that the employee does not go for leisure trips. 


SECTION 3.2 Conditional Probability and the Multiplication Rule 175 


Classifying Events as Independent or Dependent Jn Exercises 
9-14, determine whether the events are independent or dependent. Explain your 
reasoning. 


9. Selecting an ace from a standard deck of 52 playing cards and, without 
replacing it, selecting a jack from the deck. 


10. A father having hazel eyes and a daughter having hazel eyes 

11. Returning a rented movie after the due date and receiving a late fee 

12. Not putting money in a parking meter and getting a parking ticket 

13. Returning a library book before the due date and getting a new book issued 


14. A ball is selected from a bin of balls numbered from 1 through 52. It is 
replaced, and then a second numbered ball is selected from the bin. 


Classifying Events Based on Studies Jn Exercises 15-18, identify the two 
events described in the study. Do the results indicate that the events are independent 
or dependent? Explain your reasoning. 


15. A study found that women have more cone cells, but they do not perceive 
color differently from men. (Source: Scienceblogs) 


16. Certain components in coffee have been found to cause the body to produce 
higher amounts of acid, which can irritate already existing stomach ulcers. 
But, coffee does not cause stomach ulcers. (Source: Top 10 Home Remedies) 


17. A study found that eating a few pieces of chocolate each week can improve 
your cardiovascular health. (Source: SFGate) 


18. According to researchers, high engagement with mobile technology for 
escapism is linked to depression and anxiety in college-age students (Source: 
University of Illinois) 


Using the Multiplication Rule Jn Exercises 19-32, use the Multiplication 
Rule. 


19. Cards Two cards are selected from a standard deck of 52 playing cards. 
The first card is not replaced before the second card is selected. Find the 
probability of selecting a king and then selecting a queen. 


20. Coin and Die A coin is tossed and a die is rolled. Find the probability of 
tossing a head and then rolling a number lesser than 3. 


21. BRCA1 Gene Research has shown that approximately 1 woman in 
600 carries a mutation of the BRCA1 gene. About 60% of women with 
this mutation develop breast cancer. Find the probability that a randomly 
selected woman will carry the mutation of the BRCA1 gene and will develop 
breast cancer. (Adapted from Susan G. Komen) 


Sample Space: Women 


Women 


Women 


with who 
mutated develop 
BRCAI1 breast 


gene cancer 


176 


CHAPTER 3 Probability 


22. 


23. 


24. 


25. 


26. 


Pickup Trucks In a survey, 510 U.S. adults were asked whether they drive 
a pickup truck and whether they drive a Ford. The results showed that 
three in ten adults surveyed drive a Ford. Of the adults surveyed that drive 
Fords, two in nine drive a pickup truck. Find the probability that a randomly 
selected adult drives a Ford and drives a pickup truck. 


Sample Space: U.S. Adults 


pickup 
trucks 


Celebrities as Role Models In a sample of 1000 U.S. adults, 200 think 
that most Hollywood celebrities are good role models. Two U.S. adults are 
selected at random without replacement. (Adapted from Rasmussen Reports) 


(a) Find the probability that both adults think that most Hollywood 
celebrities are good role models. 

(b) Find the probability that neither adult thinks that most Hollywood 
celebrities are good role models. 

(c) Find the probability that at least one of the two adults thinks that most 
Hollywood celebrities are good role models. 

Knowing a Murder Victim In a sample of 1000 U.S. adults, 300 said they 

know a murder victim. Four U.S. adults are selected at random without 

replacement. (Adapted from Rasmussen Reports) 

(a) Find the probability that all four adults know a murder victim. 

(b) Find the probability that none of the four adults knows a murder victim. 

(c) Find the probability that at least one of the four adults knows a murder 
victim. 

Best President In a sample of 1446 U.S. registered voters, 217 said that 

John Kennedy was the best president since World War II. Two registered 

voters are selected at random without replacement. (Adapted from Quinnipiac 

University) 

(a) Find the probability that both registered voters say that John Kennedy 
was the best president since World War II. 

(b) Find the probability that neither registered voter says that John Kennedy 
was the best president since World War II. 

(c) Find the probability that at least one of the two registered voters says 
that John Kennedy was the best president since World War II. 


(d) Which of the events can be considered unusual? Explain. 


Worst President In a sample of 1446 U.S. registered voters, 188 said that 

Richard Nixon was the worst president since World War II. Three registered 

voters are selected at random without replacement. (Adapted from Quinnipiac 

University) 

(a) Find the probability that all three registered voters say that Richard 
Nixon was the worst president since World War II. 


(b) Find the probability that none of the three registered voters say that 
Richard Nixon was the worst president since World War II. 


(c) Find the probability that at most two of the three registered voters say 
that Richard Nixon was the worst president since World War II. 


(d) Which of the events can be considered unusual? Explain. 


27. 


28. 


29. 


30. 


31. 


32. 


SECTION 3.2 Conditional Probability and the Multiplication Rule 177 


Blood Types The probability that an African American person in the 
United States has type O+ blood is 47%. Six unrelated African American 
people in the United States are selected at random. (Source: American 
National Red Cross) 


(a) Find the probability that all six have type O+ blood. 

(b) Find the probability that none of the six have type O+ blood. 

(c) Find the probability that at least one of the six has type O+ blood. 
(d) Which of the events can be considered unusual? Explain. 


Blood Types The probability that a Caucasian person in the United States 
has type AB— blood is 1%. Four unrelated Caucasian people in the United 
States are selected at random. (Source: American National Red Cross) 


(a) Find the probability that all four have type AB— blood. 
(b) Find the probability that none of the four have type AB— blood. 
(c) Find the probability that at least one of the four has type AB— blood. 


(d) Which of the events can be considered unusual? Explain. 


In Vitro Fertilization In a recent year, about 1.6% of all infants born in 
the U.S. were conceived through in vitro fertilization (IVF). Of the IVF 
deliveries, about 41.1% resulted in multiple births. (Source: American Society 
for Reproductive Medicine) 


(a) Find the probability that a randomly selected infant was conceived 
through IVF and was part of a multiple birth. 


(b) Find the probability that a randomly selected infant conceived through 
IVF was not part of a multiple birth. 


(c) Would it be unusual for a randomly selected infant to have been 
conceived through IVF and to have been part of a multiple birth? 
Explain. 


Lottery Tickets According to a survey, 49% of U.S. adults have purchased 
a State lottery ticket in the past 12 months. Of these 49%, about 27.5% have 
annual incomes less than $36,000. (Adapted from Gallup) 


(a) Find the probability that a randomly selected U.S. adult purchased a 
state lottery ticket in the past 12 months and has an annual income less 
than $36,000. 


(b) Find the probability that a randomly selected U.S. adult who purchased 
a State lottery ticket in the past 12 months has an annual income greater 
than or equal to $36,000. 


(c) Would it be unusual for a randomly selected U.S. adult to have 
purchased a state lottery ticket in the past 12 months and to have an 
annual income less than $36,000? Explain. 


Digital Content in Schools According to a study, 80% of K-12 schools or 
districts in the United States use digital content such as ebooks, audio books, 
and digital textbooks. Of these 80%, 4 out of 10 use digital content as part 
of their curriculum. Find the probability that a randomly selected school or 
district uses digital content and uses it as part of their curriculum. (Source: 
School Library Journal) 


Employment Agency An HR consultant gives an applicant a 75% chance 
of getting a job after registering with his agency. If the applicant gets a job, 
then there is a 60% possibility that he would be satisfied and would not 
resign within a year. Find the probability that the applicant gets the job and 
does not resign within a year. 


178 


CHAPTER 3 Probability 


Extending Concepts 


According to Bayes’ Theorem, the probability of event A, given that event B has 
occurred, is 


P(A|B) = P(A): P(B\A) 
P(A): P(B|A) + P(A’): P(B|A’) 
In Exercises 33-38, use eal Theorem to find P(A|B). 
33. P(A) = = P(B|A) =%,and P(B|A’) =3 
34, P(A) =3,P(A ie B|A) = 3,and P(B|A') =2 
35. P(A) = 0.25, P(A’) = 0.75, P(B|A) = 0.3, and P(B|A') = 0.5 
36. P(A) = 0.62, P(A’) = 0.38, P(B|A) = 0.41, and P(B|A’) = 0.17 
37. P(A) = 73%, P(A’) = 27%, P(B|A) = 46%, and P(B|A’) = 52% 
38. P(A) = 12%, P(A’) = 88%, P(B|A) = 66%, and P(B|A’) = 19% 


. Reliability of Testing A virus infects one in every 200 people. A test used 


to detect the virus in a person is positive 80% of the time when the person 
has the virus and 5% of the time when the person does not have the virus. 
(This 5% result is called a false positive.) Let A be the event “the person is 
infected” and B be the event “the person tests positive.” 


(a) Using Bayes’ Theorem, when a person tests positive, determine the 
probability that the person is infected. 


(b) Using Bayes’ Theorem, when a person tests negative, determine the 
probability that the person is not infected. 


. Birthday Problem You are in a class that has 24 students. You want to find 


the probability that at least two of the students have the same birthday. 


(a) Find the probability that each student has a different birthday. 
(b) Use the result of part (a) to find the probability that at least two students 
have the same birthday. 


(c) Use technology to simulate the “Birthday Problem” by generating 
24 random numbers from 1 to 365. Repeat the simulation 10 times. How 
many times did you get at least two people with the same birthday? 


The Multiplication Rule and Conditional Probability By rewriting 
the formula for the Multiplication Rule, you can write a formula for finding 
conditional probabilities. The conditional probability of event B occurring, given 
that event A has occurred, is 


P(B|A) = P(A and B) 
In Exercises 41 and 42, use the information below. 


e The probability that an airplane flight departs on time is 0.89. 


e The probability that a flight arrives on time is 0.87. 


e The probability that a flight departs and arrives on time is 0.83. 


41. 
42. 


Find the probability that a flight departed on time given that it arrives on time. 


Find the probability that a flight arrives on time given that it departed on time. 


SECTION 3.3 The Addition Rule 179 


What You Should Learn 


» How to determine whether two 
events are mutually exclusive 

~ How to use the Addition Rule 
to find the probability of two 
events 


“4. Study Tip 
In probability and 
statistics, the word or 

is mostly used as an 
“inclusive or” rather than 
an “exclusive or.” For 
instance, there are three 
ways for “event A or B” to occur. 


(1) Aoccurs and B does not occur. 
(2) Boccurs and A does not occur. 
(3) A and B both occur. 


Mutually Exclusive Events m The Addition Rule s» A Summary of Probability 


Mutually Exclusive Events 


In Section 3.2, you learned how to find the probability of two events, A and B, 
occurring in sequence. Such probabilities are denoted by P(A and B). In this 
section, you will learn how to find the probability that at least one of two events 
will occur. Probabilities such as these are denoted by P(A or B) and depend on 
whether the events are mutually exclusive. 


DEFINITION 


Two events A and B are mutually exclusive when A and B cannot occur at the 


same time. That is, A and B have no outcomes in common. 


The Venn diagrams show the relationship between events that are mutually 


exclusive and events that are not mutually exclusive. Note that when events A and 
B are mutually exclusive, they have no outcomes in common, so P(A and B) = 0. 


Sample Space Sample Space 


A and B 


A and B are mutually exclusive. A and B are not mutually exclusive. 


Recognizing Mutually Exclusive Events 
Determine whether the events are mutually exclusive. Explain your reasoning. 


1. 


Event A: Roll a3 on a die. 
Event B: Roll a 4 on a die. 


. Event A: Randomly select a male student. 


Event B: Randomly select a nursing major. 


. Event A: Randomly select a blood donor with type O blood. 


Event B: Randomly select a female blood donor. 


SOLUTION 


1. 


Event A has one outcome, a 3. Event B also has one outcome, a 4. These 
outcomes cannot occur at the same time, so the events are mutually exclusive. 


. Because the student can be a male nursing major, the events are not 


mutually exclusive. 


. Because the donor can be a female with type O blood, the events are not 


mutually exclusive. 


180 CHAPTER 3 Probability 
TRY IT YOURSELF 1 
Determine whether the events are mutually exclusive. Explain your reasoning. 


1. Event A: Randomly select a jack from a standard deck of 52 playing cards. 
Event B: Randomly select a face card from a standard deck of 52 playing cards. 


2. Event A: Randomly select a vehicle that is a Ford. 
Event B: Randomly select a vehicle that is a Toyota. 
Answer: Page A34 


To explore this topic further, The Addition Rule 
see Activity 3.3 on page 188. 


The Addition Rule for the Probability of A or B 


The probability that events A or B will occur, P(A or B), is given by 
P(A or B) = P(A) + P(B) — P(Aand B). 

If events A and B are mutually exclusive, then the rule can be simplified to 
P(A or B) = P(A) + P(B). Events A and B are mutually exclusive. 


This simplified rule can be extended to any number of mutually exclusive 
events. 


Su " aap In words, to find the probability that one event or the other will occur, add 
utcomes here are double . oe eta eas: 
counted by P(A) + P(B) the individual probabilities of each event and subtract the probability that they 


both occur. As shown in the Venn diagram at the left, subtracting P(A and B) 
avoids double counting the probability of outcomes that occur in both A and B. 


Using the Addition Rule to Find Probabilities 


1. You select a card from a standard deck of 52 playing cards. Find the 
probability that the card is a 4 or an ace. 


2. You roll a die. Find the probability of rolling a number less than 3 or rolling 


Deck of 52 Cards apodeme 


SOLUTION 


1. A card that is a 4 cannot be an ace. So, the events are mutually exclusive, 
as shown in the Venn diagram. The probability of selecting a 4 or an ace is 
4 4 8 2 
P(4 = P(4) +P =—at+a=a= = 0.154. 
(4 or ace) (4) (ace) 30°52 52 5 


2. The events are not mutually exclusive because 1 is an outcome of both 


44 other cards 


events, as shown in the Venn diagram. So, the probability of rolling a 
number less than 3 or an odd number is 


Roll a Die P(less than 3 or odd) = P(less than 3) + P(odd) — P(less than 3 and odd) 
234 
Less 6 6 6 
/[ N than three 4 
\ =< 
_ 2 
3 
= (0.667. 


ve 


the World 


A survey of 1520 U.S. adults ages 
18 and older asked them whether 
they had a smartphone, a tablet 
computer, or a home broadband 
subscription. Overall, 39% said 
they have all three; 28% said they 
have two of the three; 17% said 
they have one of the three; and 
16% said they have none of them, 
as shown in the pie chart. (Source: 
Pew Research) 


Do you have a smartphone, 
tablet computer, or a home 
broadband subscription? 


All three 
39% 


A U.S. adult is selected at random. 


What is the probability that when 
asked whether the adult has a 
smartphone, a tablet computer, or 
a home broadband, the response 
will be “none of them” or “one of 
the three?” 


SECTION 3.3. The Addition Rule 181 


TRY IT YOURSELF 2 
1. A die is rolled. Find the probability of rolling a 6 or an odd number. 


2. A card is selected from a standard deck of 52 playing cards. Find the 
probability that the card is a face card or a heart. 


Finding Probabilities of Mutually Exclusive Events 


Answer: Page A34 


The frequency distribution shows volumes of sales (in dollars) and the number 
of months in which a sales representative reached each sales level during the 
past three years. Using this sales pattern, find the probability that the sales 
representative will sell between $75,000 and $124,999 next month. 


SOLUTION 


Sales volume (in dollars) | Months 


024,999 3 
25,000-—49,999 
50,000-74,999 
75,000-99,999 

100,000-124,999 
125,000-149,999 
150,000-174,999 
175,000-199,999 


PWN ON DWN 


To solve this problem, define events A and B as 


A = {monthly sales between $75,000 and $99,999 } 


and 


B = {monthly sales between $100,000 and $124,999 }. 


The events are mutually exclusive, as shown in the Venn diagram. 


Monthly Sales Volume 


Monthly sales between 


$75,000 and $99,999 


Monthly sales between 
$100,000 and $124,999 


All other monthly sales 


Because events A and B are mutually exclusive, the probability that the sales 
representative will sell between $75,000 and $124,999 next month is 


P(Aor B) = P(A) + P(B) = 


7. 9 16 


+ 
36 ©6360 = 36 


TRY IT YOURSELF 3 
Find the probability that the sales representative will sell between $0 and 


$49,999, 


=— = 0.444, 
9 


Answer: Page A34 


182 


CHAPTER 3 Probability 


Using the Addition Rule to Find Probabilities 


A blood bank catalogs the types of blood, including whether it is Rh-positive 
or Rh-negative, given by donors during the last five days. The number of 
donors who gave each blood type is shown in the table. 


1. Find the probability that a donor selected at random has type O or type A 
blood. 


2. Find the probability that a donor selected at random has type B blood or is 
Rh-negative. 


Blood type 


O A B-~ AB. Total 


Positive 156 | 139 | 37.) «12 344 
Rh-factor Negative 28 25 8 4 65 
Total 184 164 4516 409 


SOLUTION 


1. Because a donor cannot have type O blood and type A blood, these events 
are mutually exclusive. So, using the Addition Rule, the probability that a 
randomly chosen donor has type O or type A blood is 


P(type O or type A) = P(typeO) + P(type A) 
_ 184 164 
409 409 
_ 348 
~ 409 
= 0.851. 
2. Because a donor can have type B blood and be Rh-negative, these events 


are not mutually exclusive. So, using the Addition Rule, the probability that 
a randomly chosen donor has type B blood or is Rh-negative is 


P(type B or Rh-neg) = P(type B) + P(Rh-neg) — P(type B and Rh-neg) 


_ 45 65 8 
~ 409 409 409 
_ 102 

~ 409 

= 0.249. 


TRY IT YOURSELF 4 


1. Find the probability that a donor selected at random has type B or type AB 
blood. 


2. Find the probability that a donor selected at random does not have type O 
or type A blood. 


3. Find the probability that a donor selected at random has type O blood or 
is Rh-positive. 


4. Find the probability that a donor selected at random has type A blood or 
is Rh-negative. 
Answer: Page A34 


Type of probability 
and probability rules 


Classical 
Probability 


Empirical 
Probability 


Range of 
Probabilities Rule 


Complementary 
Events 


Multiplication 
Rule 


Addition Rule 


SECTION 3.3 The Addition Rule 183 


A Summary of Probability 


In words In symbols 
The number of outcomes in the sample P(E) = Number of outcomes in event FE 
space is known and each outcome is Number of outcomes in sample space 
equally likely to occur. 
The frequency of each outcome in the sample _ FrequencyofeventE  f 
space is estimated from experimentation. P( Total frequency n 
The probability of an event is between 0 0=<= P(E) =1 
and 1, inclusive. 
The complement of event E is the set of P(E') =1—- P(E) 
all outcomes in a sample space that are 
not included in £, and is denoted by E’. 
The Multiplication Rule is used to find P(A and B) = P(A)-P(B|A) Dependent events 
the probability of two events occurring P(A and B) = P(A): P(B) Independent events 
in sequence. 
The Addition Rule is used to find the P(Aor B) = P(A) + P(B) — P(A and B) 
probability of at least one of two events P(A or B) = P(A) + P(B) ~~ Mutually exclusive 
occurring. events 


Combining Rules to Find Probabilities 
Use the figure at the right to find NFL Rookies 


the probability that a randomly A breakdown by position of the 253 players picked in the 
selected draft pick is not arunning — | _ 2016 NFL draft: ; 


back or a wide receiver. 


SOLUTION 
Define events A and B. 


A: Draft pick is a running back. 
B: Draft pick is a wide receiver. 


These events are mutually 
exclusive, so the probability that 
the draft pick is a running back or 
wide receiver is 


(Source: National Football League) 


23 31 54 
B) = P(A) + P(B) = + = : 
TIE PNAS) ERE)" * Gag 953 
By taking the complement of P(A or B), you can determine that the 
probability of randomly selecting a draft pick who is not a running back or 
wide receiver is 


54 199 
LP Ae) S155. Oe 


TRY IT YOURSELF 5 
Find the probability that a randomly selected draft pick is not a linebacker or 
a quarterback. 

Answer: Page A34 


184 CHAPTER 3. Probability 


3.3 EXERCISES 


For Extra Help: MyLab Statistics 


Building Basic Skills and Vocabulary 


1. 
2. 


When two events are mutually exclusive, why is P(A and B) = 0? 


Give an example of (a) two events that are mutually exclusive and (b) two 
events that are not mutually exclusive. 


True or False? Jn Exercises 3-6, determine whether the statement is true or 
false. If it is false, explain why. 


3. 
4. 
BH 


6. 


When two events are mutually exclusive, they have no outcomes in common. 
When two events are independent, they are also mutually exclusive. 
The probability that event A or event B will occur is 
P(A or B) = P(A) + P(B) + P(Aand B). 
If events A and B are mutually exclusive, then 


P(A or B) = P(A) + P(B). 


Graphical Analysis In Exercises 7 and 8, determine whether the events shown 
in the Venn diagram are mutually exclusive. Explain your reasoning. 


7. 


Sample Space: 8. Sample Space: Movies 
Presidential Candidates 


Movies 
that are 


residential 
candidates 
who lost 


Presidential 
candidates 
who won the 
election 


rated R 


Using and Interpreting Concepts 


Recognizing Mutually Exclusive Events Jn Exercises 9-12, determine 
whether the events are mutually exclusive. Explain your reasoning. 


9. 


10. 


11. 


13. 


14. 


Event A: Randomly select a student who studies for more than 5 hours daily. 
Event B: Randomly select a student who studies for less than 2 hours daily. 


Event A: Randomly select a student with a birthday in April. 
Event B: Randomly select a student with a birthday in May. 


Event A: Randomly select a female badminton player. 
Event B: Randomly select a badminton player who is 25 years old. 


. Event A: Randomly select a member of the U.S. Congress. 


Event B: Randomly select a male U.S. Senator. 


Students A class has 60 students. Of these, 35 students are male and 20 
students know how to play a musical instrument. Of the male students, five 
can play musical instruments. Find the probability that a randomly selected 
student is male or can play a musical instrument. 


Conference A teaching conference has an attendance of 6855 people. 
Of these, 3120 are college professors and 3595 are male. Of the college 
professors, 1505 are male. Find the probability that a randomly selected 
attendee is male or a college professor. 


15. 


16. 


17. 


18. 


SECTION 3.3 The Addition Rule 185 


Mobile Defects Of the mobile phones produced by a company, 97% do not 
have a poor battery life, 95% do not have a corrupt operating system, and 
93.5% do not have a poor battery life and do not have a corrupt OS. Find the 
probability that a randomly selected mobile does not have a poor battery life 
or a corrupt OS. 


Camera Defects Of the cameras produced by a company, 8% have a flash 
problem, 12% have a focus malfunction, and 0.9% have both a flash issue 
and a focus malfunction. Find the probability that a randomly selected 
camera has a flash problem or a focus malfunction. 

Rolling a Die You roll a die. Find the probability of each event. 

(a) Rolling a 6 or a number greater than 4 

(b) Rolling a 2 or a prime number 

(c) Rolling a number less than 5 or an odd number 


Selecting a Card A card is selected at random from a standard deck of 52 


playing cards. Find the probability of each event. 


45-64 
years 


(a) Randomly selected a black suit or a king 
(b) Randomly selected a diamond or a face card 


(c) Randomly selected a club or a diamond 


. U.S. Age Distribution The estimated percent distribution of the U.S. 


population for 2025 is shown in the pie chart. Find the probability of each 
event. (Source: U.S. Census Bureau) 


(a) Randomly selecting someone who is under 5 years old 

(b) Randomly selecting someone who is 45 years or over 

(c) Randomly selecting someone who is not 65 years or over 

(d) Randomly selecting someone who is between 20 and 34 years old 


US. Age Distribution Marijuana Use in the Last 30 Days 
75 years or Over Under 5 years Used Used on 
6.0% 10-29 days all 30 days 


Used 4.0% 
1-9 days 1 


Used, but 
not in the 
last 30 days 


25-34 years 


FIGURE FOR EXERCISE 19 FIGURE FOR EXERCISE 20 


20. 


Marijuana Use The percent of college students’ marijuana use for a sample 
of 95,761 students is shown in the pie chart. Find the probability of each 
event. (Source: American College Health Association) 


(a) Randomly selecting a student who never used marijuana 

(b) Randomly selecting a student who used marijuana 

(c) Randomly selecting a student who used marijuana between 1 and 29 of 
the last 30 days 

(d) Randomly selecting a student who used marijuana on at least 1 of the 
last 30 days 


186 CHAPTER 3 Probability 


How Would You Grade 
the Media for the Way They 
Conducted Themselves in the 
2016 Presidential Campaign? 


Number responding 
wo 
3 


Response 


FIGURE FOR EXERCISE 21 


How Important Is the 
Brexit Story to You 
Personally? 


ie) 
N 
o 
t 
Ww 
Pp 
N 


191 
141 


131 


477 
276 263 
250 238 
200 
> 
F AorB D ic 


Not at all important 


Number responding 
FPeWNnN WN 
Bondbvacksa 
MW © © © © @o & 
i i i i i i 
T T T T T T 
: N 


ve) 
oO 
i} 
ac) 
° 
i=] 
a 
oO 


FIGURE FOR EXERCISE 22 


Extremely important 


Did not answer t S 


21. 


22. 


23. 


24. 


Media Conduct The responses of 1254 voters to a survey about the way the 

media conducted themselves in the 2016 presidential campaign are shown 

in the Pareto chart. Find the probability of each event. (Adapted from Pew 

Research Center) 

(a) Randomly selecting a person from the sample who did not give the 
media an A ora B 

(b) Randomly selecting a person from the sample who gave the media a grade 
better than a D 

(c) Randomly selecting a person from the sample who gave the media a D or 
an F 


(d) Randomly selecting a person from the sample who gave the media a C or 
aD 


Brexit The responses of 1007 American adults to a survey question about 
the story of Britons’ vote to leave the European Union are shown in the 
Pareto chart. Find the probability of each event. (Adapted from GfK Public 
Affairs and Corporate Communications) 


(a) Randomly selecting an adult who thinks the story is somewhat important 
(b) Randomly selecting an adult who thinks the story is not at all important 


(c) Randomly selecting an adult who thinks the story is not too important or 
not at all important 


(d) Randomly selecting an adult who thinks the story is extremely important 
or very important 


Business Degrees The table shows the numbers of male and female 
students in the U.S. who received bachelor’s degrees in business in a recent 
year. A student is selected at random. Find the probability of each event. 
(Source: National Center for Educational Statistics) 


Business degrees Nonbusiness degrees Total 
Males 191,310 621,359 812,669 
Females 172,489 909,776 1,082,265 
Total 363,799 13531,135 1,894,934 


(a) The student is male or received a business degree. 

(b) The student is female or received a nonbusiness degree. 

(c) The student is not female or received a nonbusiness degree. 

Education Tax The table shows the results of a survey that asked 
506 Maine adults whether they favored or opposed a tax to fund education. 


A person is selected at random from the sample. Find the probability of each 
event. (Adapted from Portland Press Herald) 


Support Oppose Unsure Total 


Males 128 99 20 247 
Females 173 65 21 259 
Total 301 164 41 506 


(a) The person opposes the tax or is female. 
(b) The person supports the tax or is male. 
(c) The person is not unsure or is female. 


as 


FIGURE FOR EXERCISES 27 AND 28 


SECTION 3.3 The Addition Rule 187 


25. Charity The table shows the results of a survey that asked 2850 people 
whether they were involved in any type of charity work. A person is selected 
at random from the sample. Find the probability of each event. 


Frequently Occasionally Not atall Total 


Males 221 456 795 1472 
Females 207 430 741 1378 
Total 428 886 1536 2850 


(a) The person is male or frequently involved in charity work. 
(b) The person is female or not involved in charity work at all. 
(c) The person is frequently or occasionally involved in charity work. 
(d) The person is female or not frequently involved in charity work. 
26. Eye Survey The table shows the result of a survey that asked 4087 people 


whether they wore contacts or glasses. A person is selected at random from 
the sample. Find the probability of each event. 


Only Contacts Only Glasses Both Neither Total 


Males 96 1262 266 684 2308 
Females 284 641 552 1022 2499 
Total 380 1903 818 1706 4807 


a) The person is female or wears both contacts and glasses. 
Pp g 
b) The person is male or wears neither of the two. 
Pp 
c) The person is male or does not wear contacts. 
Pp 


(d) The person wears only contacts or only glasses. 


Extending Concepts 


Addition Rule for Three Events The Addition Rule for the probability that 
event A or B or C will occur, P(A or B or C), is given by 


P(AorBorC) = P(A) + P(B) + P(C) — P(Aand B) — P(Aand C) 
— P(Band C) + P(Aand Band C). 


In the Venn diagram shown at the left, P(A or B or C) is represented by the blue 
areas. In Exercises 27 and 28, find P(A or B or C). 


27. P(A) = 0.40, P(B) = 0.10, P(C) = 0.50, 
A and B) = 0.05, P(A and C) = 0.25, P(B and C) = 0.10, 
P(A and B and C) = 0.03 


P( 

( 

28. P(A) = 0.38, P(B) = 0.26, P(C) = 0.14, 

P(A and B) = 0.12, P(A and C) = 0.03, P(B and C) = 0.09, 

P(A and Band C) = 0.01 

29. Explain, in your own words, why in the Addition Rule for P(A or B or C), 
P(A and B and C) is added at the end of the formula. 


30. Writing Can two events with nonzero probabilities be both independent 
and mutually exclusive? Explain your reasoning. 


Simulating the Probability of Rolling a 3 or 4 


ACTIVITY 


The simulating the probability of rolling a 3 or 4 applet allows you to investigate 
the probability of rolling a 3 or 4 on a fair die. The plot at the top left corner 


APPLET shows the probability associated with each outcome of a die roll. When ROLL is 
You can find the interactive clicked, n simulations of the experiment of rolling a die are performed. The results 
applet for this activity of the simulations are shown in the frequency plot. When the animate option is 


within MyLab Statistics or at 
www.pearsonglobaleditions 
.com. 


checked, the display will show each outcome dropping into the frequency plot as 
the simulation runs. The individual outcomes are shown in the text field at the far 
right of the applet. The center plot shows in blue the cumulative proportion of 
times that an event of rolling a3 or 4 occurs. The green line in the plot reflects the 
theoretical probability of rolling a3 or 4. As the experiment is conducted over and 
over, the cumulative proportion should converge to the theoretical probability. 


Probability 


0.15 1 
01 
0.05 
0 


12 3 4 5 6 


Rolls: 


Frequency 
+ 


0.3333 


jvAnimate 


Reset | 


Step 1 Specify a value for n. 

Step 2 Click ROLL four times. 
Step 3. Click RESET. 

Step 4 Specify another value for n. 
Step 5 Click ROLL. 


DRAW CONCLUSIONS 


1. Run the simulation using each value of n one time. Clear the results after each 
trial. Compare the cumulative proportion of “rolls” that result in a 3 or 4 for 
each trial with the theoretical probability of rolling a 3 or 4. 


APPLET 


2. You want to modify the applet so you can find the probability of rolling a 
number less than 4. Describe the placement of the green line. 


188 CHAPTER 3. Probability 


United States Congress 


Congress is made up of the House of Representatives and the Senate. Members 
of the House of Representatives serve two-year terms and represent a district in 
a state. The number of representatives for each state is determined by population. 
States with larger populations have more representatives than states with smaller 
populations. The total number of representatives is set by law at 435. Members of 
the Senate serve six-year terms and represent a state. Each state has 2 senators, 
for a total of 100 senators. The tables show the makeup of the 115th Congress by 


gender and political party as of January 3, 2017. 


House of Representatives 


Political party 
Republican. Democrat Independent Total 
Male 218 129 0 347 
Gender Female 23 65 0 88 
Total 241 194 0 435 

Senate 

Political party 
Republican Democrat Independent Total 
Male 47 30 2 79 
Gender Female 5 16 0 21 
Total 52 46 2 100 


EXERCISES 


1. Find the probability that a randomly selected 
representative is female. Find the probability 
that a randomly selected senator is female. 


2. Compare the probabilities from Exercise 1. 


3. A representative is selected at random. Find the 
probability of each event. 


(a) The representative is male. 
(b) The representative is a Republican. 


(c) The representative is male given that the 
representative is a Republican. 


(d) The representative is female and a 
Democrat. 


4. Among members of the House of 
Representatives, are the events “being female” 
and “being a Democrat” independent or 
dependent events? Explain. 


5. A senator is selected at random. Find the 


probability of each event. 


(a) The senator is male. 

(b) The senator is not a Democrat. 

(c) The senator is female or a Republican. 
(d) The senator is male or a Democrat. 


. Among members of the Senate, are the events 


“being female” and “being an Independent” 
mutually exclusive? Explain. 


. Using the same row and column headings as 


the tables above, create a combined table for 
Congress. 


. A member of Congress is selected at random. 


Use the table from Exercise 7 to find the 
probability of each event. 


(a) The member is Independent. 
(b) The member is female and a Republican. 
(c) The member is male or a Democrat. 


Case Study 189 


190 CHAPTER 3. Probability 


34 Additional Topics in Probability and Counting 


What You Should Learn 


~ How to find the number of 
ways a group of objects can be 
arranged in order 


Vv 


w 


~ How to find the number of ways 
to choose several objects from 
a group without regard to order 


~ How to use counting principles 
to find probabilities 


Study Tip 


Notice that small values 
of n can produce very 
large values of n!. For instance, 

10! = 3,628,800. Be sure you know 
how to use the factorial key on your 
calculator. 


Sudoku Number Puzzle 


Permutations ms Combinations m= Applications of Counting Principles 


Permutations 


In Section 3.1, you learned that the Fundamental Counting Principle is used to 
find the number of ways two or more events can occur in sequence. An important 
application of the Fundamental Counting Principle is determining the number of 
ways that n objects can be arranged in order. An ordering of n objects is called 
a permutation. 


DEFINITION 


A permutation is an ordered arrangement of objects. The number of different 


permutations of n distinct objects is n!. 


The expression n! is read as n factorial. If 1 is a positive integer, then n! is 
defined as follows. 


nl =n-(n—1)+(n— 2): (n— 3):++3+2-1 


As a special case, 0! = 1. Here are several other values of n!. 


=1 


M=2-1=2 B!=3-21=6 41=4-3-2-1=24 


Finding the Number of Permutations of n Objects 


The objective of a9 X 9 Sudoku number puzzle is to fill the grid so that each 
row, each column, and each 3 xX 3 grid contain the digits 1 through 9. How 
many different ways can the first row of a blank 9 < 9 Sudoku grid be filled? 


SOLUTION 
The number of permutations is 
9! = 9-8-7-6°5-4°3°2+1 = 362,880. 
So, there are 362,880 different ways the first row can be filled. 


TRY IT YOURSELF 1 


The Big 12 is a collegiate athletic conference with 10 schools: Baylor, lowa 
State, Kansas, Kansas State, Oklahoma, Oklahoma State, TCU, Texas, Texas 
Tech, and West Virginia. How many different final standings are possible for 
the Big 12’s football teams? Answer: Page A34 


You may want to choose some of the objects in a group and put them in 
order. Such an ordering is called a permutation of n objects taken r at a time. 


Permutations of n Objects Taken r at a Time 
The number of permutations of n distinct objects taken r at a time is 


4A= p where r = n. 


= es 
(n—r) 


Tech Tip 


You can use technology 
such as Minitab, Excel, 
StatCrunch, or the 
TI-84 Plus to find the 
number of permutations 
of n objects taken r at 
a time. For instance, here is how to 
find ,P. in Example 2 on a TI-84 Plus. 


Enter the total number of objects, 
n= 10. 


- = 


— 


MATH 


Choose the PRB menu. 
2: nPr 


Enter the number of objects taken, 
r=4. 


ENTER 


TIl-84 PLUS 
16 ner 4 


2848 


SECTION 3.4 Additional Topics in Probability and Counting 191 


Finding ,P, 


Find the number of ways of forming four-digit codes in which no digit is 
repeated. 


SOLUTION 


To form a four-digit code with no repeating digits, you need to select 4 digits 
from a group of 10, son = 10 andr = 4. 


nPy = 104 
10! 
~ (10 - 4)! 
10! 
~ 6L 
 10-9+8+7-6t 
6t 
= 5040 


So, there are 5040 possible four-digit codes that do not have repeating digits. 


TRY IT YOURSELF 2 

A psychologist shows a list of eight activities to a subject in an experiment. 

How many ways can the subject pick a first, second, and third activity? 
Answer: Page A34 


Finding ,P, 
Each year, 33 race cars start the Indianapolis 500. How many ways can the cars 
finish first, second, and third? 


SOLUTION 
You need to select three race cars from a group of 33, so 
n = 33 andr = 3. 


Because the order is important, the number of ways the cars can finish first, 
second, and third is 

33! — 33! 33+32+31-30f 
(33 — 3)! 30! 301 


nPy = 33P3 = = 32,736. 
TRY IT YOURSELF 3 


The board of directors of a company has 12 members. One member is the 
president, another is the vice president, another is the secretary, and another 
is the treasurer. How many ways can these positions be assigned? 

Answer: Page A34 


In Example 3, note that the Fundamental Counting Principle can be used to 


obtain the same result. There are 33 choices for first place, 32 choices for second 
place, and 31 choices for third place. So, there are 


33+32+31+ = 32,736 


ways the cars can finish first, second, and third. 


192 CHAPTER 3. Probability 


You may want to order a group of n objects in which some of the objects are 
the same. For instance, consider the group of letters 


AAAABBC. 


This group has four A’s, two B’s, and one C. How many ways can you order such 
a group? Using the formula for,P,, you might conclude that there are 


P, = 7! = 5040 


possible orders. However, because some of the objects are the same, not all of 
these permutations are distinguishable. How many distinguishable permutations 
are possible? The answer can be found using the formula for the number of 
distinguishable permutations. 


Distinguishable Permutations 


The number of distinguishable permutations of 1 objects, where n, are of one 
type, m2 are of another type, and so on, is 


n! 


nNy!+Ny!+n3!--++n,! 
where 


Nyt Ng tng tr t+n=n 


Using the formula for distinguishable permutations, you can determine that 
the number of distinguishable permutations of the letters AAAABBC is 


7) 7°6°5 
Al-2!-1! 2 
= 105 distinguishable permutations. 


Finding the Number of Distinguishable Permutations 
A building contractor is planning to develop a subdivision. The subdivision is 


to consist of 6 one-story houses, 4 two-story houses, and 2 split-level houses. 
In how many distinguishable ways can the houses be arranged? 


SOLUTION 


There are to be 12 houses in the subdivision, 6 of which are of one type 
(one-story), 4 of another type (two-story), and 2 of a third type (split-level). 
TI-84 PLUS So, there are 


12'eceldi2t3 12! 12+11+10-9+8+7°6! 
perce 6141-2! 61-4! +2! 


= 13,860 distinguishable ways. 


You can check your answer using technology, as shown at the left on a 
TI-84 Plus. 


Interpretation There are 13,860 distinguishable ways to arrange the houses in 
the subdivision. 


TRY IT YOURSELF 4 


The contractor wants to plant six oak trees, nine maple trees, and five poplar 
trees along the subdivision street. The trees are to be spaced evenly. In how 
many distinguishable ways can they be planted? 

Answer: Page A34 


Tech Tip 


You can use technology 
such as Minitab, Excel, 
StatCrunch, or the 
TI-84 Plus to find the 
number of combinations 
of n objects taken r at 
a time. For instance, here is how 

to find ,C, in Example 5 ona 

TI-84 Plus. 


Enter the total number of objects, 
n= 16. 


Choose the PRB menu. 
32 nCr 


Enter the number of objects taken, 
r=4., 


ENTER 


TI-84 PLUS 


SECTION 3.4 Additional Topics in Probability and Counting 193 


Combinations 


A state park manages five beaches labeled A, B, C, D, and E. Due to budget 
constraints, new restrooms will be built at only three beaches. There are 10 ways 
for the state to select the three beaches. 


ABC, ABD, ABE, ACD, ACE, ADE, BCD, BCE, BDE, CDE 


In each selection, order does not matter (ABC is the same as BAC). The number 
of ways to choose r objects from n objects without regard to order is called the 
number of combinations of n objects taken r at a time. 


Combinations of n Objects Taken r at a Time 


The number of combinations of r objects selected from a group of n objects 
without regard to order is 
n!} 
Ceo, an 
"Tr (n—r)tn 


where r = n. 


You can think of a combination of n objects chosen r at a time as a 
permutation of n objects in which the r selected objects are alike and the 
remaining n — r (not selected) objects are alike. 


Finding the Number of Combinations 


A state’s department of transportation plans to develop a new section of 
interstate highway and receives 16 bids for the project. The state plans to 
hire four of the bidding companies. How many different combinations of four 
companies can be selected from the 16 bidding companies? 


SOLUTION 
The state is selecting four companies from a group of 16, so 
n= loandr = 4. 
Because order is not important, there are 
nr = 16C4 
_ 16! 
(16 — 4)!4! 
16! 
12!4! 
— 16°15-14-13-12! 
12!-4! 
1820 different combinations. 


Interpretation There are 1820 different combinations of four companies that 
can be selected from the 16 bidding companies. 


TRY IT YOURSELF 5 


The manager of an accounting department wants to form a three-person 
advisory committee from the 20 employees in the department. In how many 
ways can the manager form this committee? 

Answer: Page A34 


194 CHAPTER 3. Probability 


Applications of Counting Principles 


The table summarizes the counting principles. 


Study Tip 


To solve a problem 


; ; Principle Description Formula 
using a counting : 
principle, be sure you choose the Fundamental If one event can occur in m ways and mn 
appropriate counting principle. To Counting a second event can occur in n ways, 
Principle then the number of ways the two 


help you do this, consider these . : 
. events can occur in sequence is m:n. 

questions. 

Permutations |= The number of permutations of n n! 


e Are there two or more separate ra : 
distinct objects 


events? Fundamental Counting 
Principle The number of permutations of n 7 n! 
distinct objects taken r at a time, 7 


e /s the order of the objects 
where r =n 


important? Permutation 


The number of distinguishable n! 
permutations of n objects where nylon! +++ ng! 
n, are of one type, n, are of another 

type, and so on, and 


nyt Mg +ngt+e+tnpaHn 


e Are the chosen objects from a 
larger group of objects in which 
order is not important? 
Combination 


Note that some problems may 
require you to use more than one 
counting principle (see Example 8). 


Combinations | The number of combinations of n! 
r objects selected from a group of 
n objects without regard to order, 
wherer =n 


Finding Probabilities 


A student advisory board consists of 17 members. Three members will be 
chosen to serve as the board’s chair, secretary, and webmaster. Each member 
is equally likely to serve in any of the positions. What is the probability of 
randomly selecting the three members who will be chosen for the board? 


SOLUTION 

Note that order is important because the positions (chair, secretary, and 

webmaster) are distinct objects. There is one favorable outcome and there are 
17! — 17! | 17-16-15-14! 

(17-3)! 14! 14! 

ways the three positions can be filled. So, the probability of correctly selecting 

the three members who hold each position is 


17P3 = = 17-16-15 = 4080 


P(selecting the three members ) = 0.0002. 


1 
~ 4080 


You can check your answer using technology. For instance, using Excel’s 
PERMUT command, you can find the probability of selecting the three 


A members, as shown at the left. 
41_| PERMUT(17,3) 
2 4080 TRY IT YOURSELF 6 
3 | 1/A2 A student advisory board consists of 20 members. Two members will be chosen 
4 | 0.000245098 


to serve as the board’s chair and secretary. Each member is equally likely to 
serve in either of the positions. What is the probability of randomly selecting 
the two members who will be chosen for the board? 

Answer: Page A34 


=) Picturing 
the World 


One of the largest lottery jackpots 
ever, $656 million, was won in the 
Mega Millions lottery. When the 
jackpot was won, five different 
numbers were chosen from 1 to 
56 and one number, the Mega 
Ball, was chosen from 1 to 46. 
The winning numbers are shown 
below. 


8 & 
cate 
HB @ 


Ball 


In 2013, the lottery changed its 
rules. Now, a player chooses five 
different numbers from 1 to 75 and 
one number from 1 to 15. A player 
wins the jackpot by matching all 
six winning numbers in a drawing. 


You purchase one ticket in the 
Mega Millions lottery. Find the 
probability of winning the jackpot 
using the old rules and the new 
rules. Which set of rules provides 
you with a better chance of 
winning the jackpot? How likely is 
your chance of winning? 


SECTION 3.4 Additional Topics in Probability and Counting 195 


Finding Probabilities 
Find the probability of being dealt 5 diamonds from a standard deck of 
52 playing cards. 
SOLUTION 
In a standard deck of playing cards, 13 cards are diamonds. Note that it does 
not matter what order the cards are selected. The possible number of ways of 
choosing 5 diamonds out of 13 is ;3C;. The number of possible five-card hands 
1S s59Cs. So, the probability of being dealt 5 diamonds is 
3C5 
52C5 
1287 

2,598,960 
= (0.0005. 


P(5 diamonds ) 


TRY IT YOURSELF 7 


Find the probability of being dealt 5 diamonds from a standard deck of playing 
cards that also includes two jokers. In this case, the joker is considered to be a 
wild card that can be used to represent any card in the deck. 

Answer: Page A34 


EXAMPLE 8 


Finding Probabilities 


A food manufacturer is analyzing a sample of 400 corn kernels for the presence 
of a toxin. In this sample, three kernels have dangerously high levels of the toxin. 
Four kernels are randomly selected from the sample. What is the probability that 
exactly one kernel contains a dangerously high level of the toxin? 


SOLUTION 


Note that it does not matter what order the kernels are selected. The possible 
number of ways of choosing one toxic kernel out of three toxic kernels is 3C,. 
The possible number of ways of choosing 3 nontoxic kernels from 397 nontoxic 
Kernels is 397C3. So, using the Fundamental Counting Principle, the number of 
ways of choosing one toxic kernel and three nontoxic kernels is 


3C 1+ 397C3 = 3+ 10,349,790 = 31,049,370. 
The number of possible ways of choosing 4 kernels from 400 kernels is 
4o0C4 = 1,050,739,900. 

So, the probability of selecting exactly 1 toxic kernel is 
3C1* 397C3 

ao0C4 
31,049,370 

1,050,739,900 
= 0.030. 


P(1 toxic kernel) = 


TRY IT YOURSELF 8 

A jury consists of five men and seven women. Three jury members are selected 

at random for an interview. Find the probability that all three are men. 
Answer: Page A34 


196 CHAPTER 3. Probability 


3.4 EXERCISES 


For Extra Help: MyLab Statistics 


Building Basic Skills and Vocabulary 


1. When you calculate the number of permutations of 1 distinct objects taken 
rata time, what are you counting? Give an example. 


2. When you calculate the number of combinations of r objects taken from a 
group of n objects, what are you counting? Give an example. 


True or False? Jn Exercises 3—6, determine whether the statement is true or 
false. If it is false, rewrite it as a true statement. 


3. A combination is an ordered arrangement of objects. 
4. The number of different ordered arrangements of n distinct objects is n!. 


5. When you divide the number of permutations of 11 objects taken 3 at a time 
by 3!, you will get the number of combinations of 11 objects taken 3 at a time. 


6. 7C5 = 7Cz 


In Exercises 7-14, perform the indicated calculation. 


ds oP 8. 14P3 
Oo: ac, 10. 5,Cs 
C Cc 
‘12% vp, 0% 
12C6 14C7 

P. P. 
2 14, 2 
13P1 12P4 


In Exercises 15-18, determine whether the situation involves permutations, 
combinations, or neither. Explain your reasoning. 


15. The number of ways 16 floats can line up in a row for a parade 


16. The number of ways a four-member committee can be chosen from 
10 people 

17. The number of ways 2 captains can be chosen from 28 players on a lacrosse 
team 


18. The number of four-letter passwords that can be created when no letter can 
be repeated 


Using and Interpreting Concepts 


19. Trophies You have won nine different trophies. How many different ways 
can you arrange the trophies side by side on a shelf? 


20. Skiing Eight people compete in a downhill ski race. Assuming that there 
are no ties, in how many different orders can the skiers finish? 


21. Security Code In how many ways can the letters A, B, C, D, E, and F be 
arranged for a six-letter security code? 


22. Baseball The starting lineup for a baseball team in an international match 
is 12 players. How many different batting orders are possible using the 
starting lineup? 


23. 


24. 


25. 


26. 


27. 


28. 


29. 


30. 


31. 


32. 


33. 


34. 


35. 


36. 


37. 


38. 


SECTION 3.4 Additional Topics in Probability and Counting 197 


Footrace There are 50 runners in a race. How many ways can the runners 
finish first, second, and third? 


Dancing Competition There are 20 finalists in an inter-school dancing 
competition. The top three students will receive prize money. How many 
ways can the dancers finish first through third? 


Playlist A DJ is preparing a playlist of 24 songs. How many different ways 
can the DJ choose the first six songs? 


Archaeology Club An archaeology club has 38 members. How many 
different ways can the club select a president, vice president, treasurer, and 
secretary? 


Blood Donors At a blood drive, 5 donors with type O+ blood, 7 donors 
with type A+ blood, and 3 donors with type B+ blood are in line. In how 
many distinguishable ways can the donors be in line? 


Necklaces You are putting 9 pieces of blue beach glass, 3 pieces of red 
beach glass, and 7 pieces of green beach glass on a necklace. In how many 
distinguishable ways can the beach glass be put on the necklace? 


Letters In how many distinguishable ways can the letters in the word 
statistics be written? 


Computer Science A byte is a sequence of eight bits. A bit can be a 0 or 
a 1. In how many distinguishable ways can you have a byte with five 0’s and 
three 1’s? 


Experimental Group In order to conduct an experiment, 4 subjects are 
randomly selected from a group of 20 subjects. How many different groups 
of four subjects are possible? 


Jury Selection From a group of 36 people, a jury of 12 people is selected. 
In how many different ways can a jury of 12 people be selected? 


Students A class has 50 students. In how many different ways can five 
students form a group to work on a class project? (Assume the order of the 
students is not important.) 


Lottery Number Selection A lottery has 52 numbers. In how many different 
ways can 6 of the numbers be selected? (Assume that order of selection is 
not important.) 


Menu A restaurant offers a dinner special that lets you choose from 
10 entrées, 8 side dishes, and 13 desserts. You can choose one entrée, one 
side dish, and two desserts. How many different meals are possible? 


Floral Arrangements A floral arrangement consists of 6 different colored 
roses, 3 different colored carnations, and 3 different colored daisies. You can 
choose from 8 different colors of roses, 6 different colors of carnations, and 
7 different colors of daisies. How many different arrangements are possible? 


Milk Adulteration A food inspection office is analyzing milk samples 
from 75 vendors for adulteration. The milk supplied by six of the vendors 
have high levels of adulteration. Five vendors are randomly selected from 
the sample. Using technology, how many ways could two vendors who put 
adulterants in milk and three vendors who do not be chosen? 


Property Inspection A property inspector is visiting 24 properties. Six of 
the properties are one acre or less in size, and the rest are greater than one 
acre in size. Eight properties are randomly selected. Using technology, how 
many ways could three properties that are each one acre or less and five 
properties that are each larger than one acre be chosen? 


198 CHAPTER 3. Probability 


How Many of Your Closest 
Family and Friends Have Food 
Allergies or Intolerances? 

Most 


\ 


Only a few 
43% 


FIGURE FOR EXERCISES 45-48 


39. 


41. 


42. 


43. 


Council of Australian Governments The Council of Australian governments 
has 10 members. Two members are chosen to serve as the executive 
committee members. Each council member is equally likely to serve in 
these positions. What is the probability of randomly selecting the committee 
members? (Source: Council of Australian Governments) 


. University Committee The University of California Health Services 


committee has five members. Two members are chosen to serve as the 
committee chair and vice chair. Each committee member is equally likely 
to serve in either of these positions. What is the probability of randomly 
selecting the chair and the vice chair? (Source: University of California) 


Horse Race _ A horse race has 12 entries. Assuming that there are no ties, 
what is the probability that the three horses owned by one person finish first, 
second, and third? 


Salad Dressing A salad shop offers eleven sauces. No sauce is used more 
than once. What is the probability that the sauces in a four-sauce salad are 
mayonnaise, mustard, barbecue, and red chili? 


Jukebox You look over the songs on a jukebox and determine that you like 
15 of the 56 songs. 


(a) What is the probability that you like the next three songs that are 
played? (Assume a song cannot be repeated.) 


b) What is the probability that you do not like the next three songs that are 
p y y g 
played? (Assume a song cannot be repeated.) 


. Officers The offices of president, vice president, secretary, and treasurer 


for an environmental club will be filled from a pool of 14 candidates. Six of 
the candidates are members of the debate team. 


(a) What is the probability that all of the offices are filled by members of the 
debate team? 


(b) What is the probability that none of the offices are filled by members of 
the debate team? 


Food Allergies or Intolerances = Jn Exercises 45—48, use the pie chart, which 
shows the results of a survey of 1500 U.S. adults who were asked how many of 
their closest family and friends have food allergies or intolerances. (Adapted from 
Pew Research Center) 


45. 


46. 


47. 


49. 


You choose 2 adults at random. What is the probability that both say most 
of their closest family and friends have food allergies or intolerances? 


You choose 3 adults at random. What is the probability that all three say 
none of their closest family and friends have food allergies or intolerances? 


You choose 6 adults at random. What is the probability that none of the 
six say some of their closest family and friends have food allergies or 
intolerances? 


. You choose 4 adults at random. What is the probability that none of the 


four say only a few of their closest family and friends have food allergies or 
intolerances? 


Lottery In a state lottery, you must correctly select 7 numbers (in any 
order) out of 70 to win the top prize. You purchase one lottery ticket. What 
is the probability that you will win the top prize? 


50. 


SECTION 3.4 Additional Topics in Probability and Counting 199 


Committee A company that has 200 employees chooses a committee of 
5 to represent employee retirement issues. When the committee is formed, 
none of the 56 minority employees are selected. 


(a) Use technology to find the number of ways 5 employees can be chosen 
from 200. 


(b) Use technology to find the number of ways 5 employees can be chosen 
from 144 nonminorities. 


(c) What is the probability that the committee contains no minorities when 
the committee is chosen randomly (without bias)? 


(d) Does your answer to part (c) indicate that the committee selection is 
biased? Explain your reasoning. 


Warehouse /n Exercises 51-54, a warehouse employs 24 workers on first 
shift, 17 workers on second shift, and 13 workers on third shift. Eight workers are 
chosen at random to be interviewed about the work environment. 


51. 
52. 
53. 
54. 


Find the probability of choosing five first-shift workers. 
Find the probability of choosing six second-shift workers. 
Find the probability of choosing four third-shift workers. 


Find the probability of choosing two second-shift workers and two third-shift 
workers. 


Extending Concepts 


55. 


56. 


57. 


58. 


Defective Units A shipment of 10 microwave ovens contains 2 defective 
units. A restaurant buys three units. What is the probability of the restaurant 
buying at least two nondefective units? 


Defective Disks A pack of 100 recordable DVDs contains 5 defective 
disks. You select four disks. What is the probability of selecting at least three 
nondefective disks? 


Employee Selection Four sales representatives for a company are to 
be chosen at random to participate in a training program. The company 
has eight sales representatives, two in each of four regions. What is the 
probability that the four sales representatives chosen to participate in the 
training program will be from only two of the four regions? 


Employee Selection In Exercise 57, what is the probability that the four 
sales representatives chosen to participate in the training program will be 
from only three of the four regions? 


Cards Jn Exercises 59-62, you are dealt a hand of five cards from a standard 
deck of 52 playing cards. 


59. 


60. 
61. 


62. 


Find the probability of being dealt two clubs and one of each of the other 
three suits. 


Find the probability of being dealt four of a kind. 


Find the probability of being dealt a full house (three of one kind and two of 
another kind). 


Find the probability of being dealt three of a kind (the other two cards are 
different from each other). 


AND | Statistics in the Real World 


Uses 


Probability affects decisions when the weather is forecast, when medications 
are selected, and even when players are selected for professional sports teams. 
Although intuition is often used for determining probabilities, you will be better 
able to assess the likelihood of an event by applying the rules of probability. 

For instance, you work for a real estate company and are asked to estimate the 
likelihood that a particular house will sell for a particular price within the next 90 
days. You could use your intuition, but you could better assess the probability by 
looking at sales records for similar houses. 


Abuses 


One common abuse of probability is thinking that probabilities have “memories.” 
For instance, the probability that a coin tossed eight times will land heads up 
every time is about 0.004. However, when seven heads have been tossed in a row, 
the probability that the eighth toss lands heads up is 0.5. Each toss is independent 
of all other tosses. The coin does not “remember” that it has already landed 
heads up seven times. 

A famous instance of this abuse happened at a casino in Monte Carlo, 
Monaco, in 1913. After a roulette wheel landed on black 15 times in a row, 
people started rushing to bet on red, thinking that the wheel was bound to land 
on red soon. The wheel kept landing on black, and players doubled and tripled 
their bets, using the same reasoning. The wheel ended up landing on black a 
record 26 times in a row, costing players millions. 


Ethics 


A study by economists Daniel Chen, Tobias Moskowitz, and Kelly Shue found 
evidence that the gambler’s fallacy occasionally leads baseball umpires, loan 
officers, and judges in refugee asylum courts to make mistakes. For instance, 
when loan officers have approved five loan applications in a row, they might 
think that six deserving loans in a row is unlikely and reject the sixth application 
based on a minor flaw when objectively it should be approved. The study 
concluded that up to 9% of loan decisions are influenced by this fallacy. 
Similarly, when judges are reviewing a request for asylum, they might be more 
likely to deny the case if they approved the last two cases. The authors of the 
study estimated that as many as 2% of asylum cases may be affected. Although 
not as serious an injustice as the first two examples, the study also found that 
baseball umpires are about 1.5% less likely to call a pitch a strike when they called 
the previous pitch a strike. For decision makers such as judges to make ethical 
decisions, they must attempt to view each case as independent from previous cases. 


EXERCISES 


A “Daily Number” lottery has a three-digit number from 000 to 999. You buy one 
ticket each day. Your number is 389. 


1. What is the probability of winning next Tuesday and Wednesday? 
2. You won on Tuesday. What is the probability of winning on Wednesday? 
3. You did not win on Tuesday. What is the probability of winning on Wednesday? 


200 CHAPTER 3. Probability 


Chapter Summary 201 


< Chapter Summary 


Review 
What Did You Learn? Example(s) Exercises 
Section 3.1 
» How to identify the sample space of a probability experiment and how to 1,2 1-4 
identify simple events 
» How to use the Fundamental Counting Principle to find the number of ways 3,4 5,6 
two or more events can occur 
» How to distinguish among classical probability, empirical probability, and 5-8 7-12 
subjective probability 
» How to find the probability of the complement of an event and how to use a 9-11 13-16 
tree diagram and the Fundamental Counting Principle to find probabilities 
Section 3.2 
» How to find the probability of an event given that another event has occurred 1 17, 18 
» How to distinguish between independent and dependent events 2 19-22 
» How to use the Multiplication Rule to find the probability of two or more 3-5 23, 24 
events occurring in sequence and to find conditional probabilities 
P(A and B) = P(A): P(B|A) Events A and B are dependent. 
P(A and B) = P(A): P(B) Events A and B are independent. 
Section 3.3 
» How to determine whether two events are mutually exclusive 1 25, 26 
» How to use the Addition Rule to find the probability of two events 2-5 27-40 
P(A or B) = P(A) + P(B) — P(Aand B) 
P(A or B) = P(A) + P(B) Events A and B are mutually exclusive. 
Section 3.4 
» How to find the number of ways a group of objects can be arranged in order 1-5 41-48 


and the number of ways to choose several objects from a group without 
regard to order 


n) ‘ F : 
nl =i Permutations of n objects taken r at a time 
nl ee ‘ 
Distinguishable permutations 
ny!+nol+ngl--- nl 
n| Se 8 ? ‘ 
2c, = > Combinations of 1 objects taken r at a time 
(n—r)Ir! 


» How to use counting principles to find probabilities 6-8 49-53 


202 


CHAPTER 3 Probability 


3 Review Exercises 


Section 3.1 


In Exercises 1—4, identify the sample space of the probability experiment and 
determine the number of outcomes in the event. Draw a tree diagram when 
appropriate. 


1. Experiment: Tossing four coins 
Event: Getting three heads 


2. Experiment: Rolling 2 six-sided dice 
Event: Getting a sum of 4 or 5 


3. Experiment: Choosing a month of the year 
Event: Choosing a month that begins with the letter J 


4, Experiment: Guessing the gender(s) of the three children in a family 
Event: Guessing that the family has two boys 


In Exercises 5 and 6, use the Fundamental Counting Principle. 


5. A student has to choose a menu consisting of a soup, a sandwich and a 
dessert from choices of 3 soups, 5 sandwiches and 2 desserts. How many ways 
can the student choose the menu? 


6. The registration numbers of candidates appearing for an examination have 
four letters followed by three digits. Assuming that any letter or digit can be 
used, how many different registration numbers are possible? 


In Exercises 7-12, classify the statement as an example of classical probability, 
empirical probability, or subjective probability. Explain your reasoning. 


7. Based on the results of a life-testing experiment, a manufacturer says there 
is a 0.07 probability that a randomly chosen component will last for 3 hours. 


8. The probability of randomly selecting an ace from a standard deck of 
52 playing cards is about 0.077. 


9. The chance that it will rain tomorrow is 25%. 
10. The probability that a person will be able to swim 20 miles is 40%. 
11. The probability of getting a sum of 10 or 11 when a pair of dice is rolled is x 


12. The chance that a randomly selected person in the United States is between 
17 and 23 years old is about 9.5%. (Source: U.S. Census Bureau) 


In Exercises 13 and 14, use the table, which shows the numbers (in thousands) of 
bachelor’s degrees for a recent year. (Source: National Center for Education Statistics) 
Health Social Sciences/ 


Degree __ Business Hiateland History Psychology | Other 


Percent 361 181 178 114 1006 


13. Find the probability that a randomly selected degree will be in business or 
psychology. 


14, Find the probability that a randomly selected degree will not be in health 
professions or social sciences/history. 


Review Exercises 203 


Telephone Numbers The telephone numbers for a region of Pennsylvania 
have an area code of 570. The next seven digits represent the local telephone numbers 
for that region. These cannot begin with a 0 or 1. In Exercises 15 and 16, assume your 
cousin lives within the given area code. 


15. What is the probability of randomly generating your cousin’s telephone 
number on the first try? 


16. What is the probability of not randomly generating your cousin’s telephone 
number on the first try? 


Section 3.2 


In Exercises 17 and 18, use the table, which shows the numbers of students from 
American Bar Association approved law schools who took the Bar Examination 
for the first time in a recent year and the numbers of students who repeated the 
exam that year. (Source: National Conference of Bar Examiners) 


Passed Failed Total 


First time 36,534 13,194 | 49,728 
Repeat 6,454 | 10,581 17,035 
Total 42,988 | 23,775 | 66,763 


17. Find the probability that a student took the exam for the first time, given that 
the student failed. 


18. Find the probability that a student passed, given that the student repeated 
the exam. 


In Exercises 19-22, determine whether the events are independent or dependent. 
Explain your reasoning. 


19. Rolling a die three times, getting two sixes, and rolling it a fourth time and 
getting a two. 


20. Participating in a training camp for running marathons and successfully 
completing a marathon run. 


21. Regularly attending the lectures of a course and passing that course. 


22. You are given that P(A) = 0.35 and P(B’) = 0.25. Do you have enough 
information to find P(B) and P(A and B)? Explain. 


23. Two balls are drawn from a bag containing 20 white and 5 black balls. In the first 
draw a ball is drawn at random and then replaced in the bag. In the second, a ball 
is drawn again at random. Find the probability of getting black balls in both the 
draws. Is this an unusual event? Explain. 


24. A bag has 12 white, 10 red and 8 black balls. What is the probability that 
without looking in the bag, you will first select and remove a white ball, and 
then select either a red or black ball? Is this an unusual event? Explain. 


Section 3.3 


In Exercises 25 and 26, determine whether the events are mutually exclusive. 
Explain your reasoning. 


25. Event A: Randomly select a red jelly bean from a jar. 
Event B: Randomly select a yellow jelly bean from the same jar. 


26. Event A: Randomly select a person who is a professional singer. 
Event B: Randomly select a person who is a professional dancer. 


204 CHAPTER 3. Probability 


FIGURE FOR EXERCISE 32 


Students in 
Public Schools 


500-999 | 300-499 
33.2% 274% 


FIGURE FOR EXERCISES 33 AND 34 


27. Event A: Randomly select a U.S. citizen of Indian origin.. 
Event B: Randomly select a U'S. citizen of Chinese origin. 


28. A sample of 6500 automobiles found that 1560 of the automobiles were 
black, 3120 of the automobiles were sedans, and 1170 of the automobiles 
were black sedans. Find the probability that a randomly chosen automobile 
from this sample is black or a sedan. 


In Exercises 29-32, find the probability. 


29. Ina random sample of 300 male professionals it is found that 40% play golf, 
60% play soccer and 30% play both golf and soccer. Find the probability that 
a person selected at random from this sample plays golf or soccer. 


30. In a community 60% of the population speaks English, 30% speaks Spanish 
and 15% speaks both English and Spanish. Find the probability that a 
randomly chosen person from this community speaks English or Spanish. 

31. A card is randomly selected from a standard deck of 52 playing cards. Find 
the probability that the card is between 7 and 10, inclusive, or is black. 

32. The spinner shown at the left is spun. The spinner is equally likely to land on 
each number. Find the probability than the spinner lands on a multiple of 3 
or a number greater than 5. 


In Exercises 33 and 34, use the pie chart at the left, which shows the percent 
distribution of the number of students in U.S. public schools in a recent year. 
(Source: U.S. National Center for Education Statistics) 


33. A 10-sided die, numbered 1 to 10, is rolled. Find the probability that the roll 
results in an even number or a number greater than 6. 


34. Find the probability of randomly selecting a school with 300 or more students. 


In Exercises 35—38, use the Pareto chart, which shows the results of a survey in which 
3078 adults were asked with which social class they identify. (Adapted from Gallup) 


Americans’ Social Class Self-Identification 


1500 
oh 1329 
=I 
3 1200+ 
g 929 
5. 
8 900 -- 
3) 
el 468 
i=} 
Z 300+ 254 
98 
_ 
S S ’ S S 
a a & &S we e 
s as” oS e na 
R SS 
¥ a & wa ss 
Response 


35. Find the probability of randomly selecting an adult who identifies as middle 
or upper-middle class. 


36. Find the probability of randomly selecting an adult who identifies as working 
or lower class. 


37. Find the probability of randomly selecting an adult who does not identify as 
middle class. 


38. Find the probability of randomly selecting an adult who does not identify as 
upper or lower class. 


Letter grade | Number of students 


A 8 
B 10 
Cc 12 
D 
F 


TABLE FOR EXERCISE 52 


Review Exercises 205 


39. You are given that P(A) = 0.15 and P(B) = 0.40. Do you have enough 
information to find P(A or B)? Explain. 


40. You are given that P(A or B) = 0.55 and P(A) + P(B) = 1. Do you have 
enough information to find P(A and B)? Explain. 


Section 3.4 


In Exercises 41—44, perform the indicated calculation. 


5C3 
, 10C3 


41. 1;P) 42. sPs 43. 7C, 44 


In Exercises 45-48, use combinations and permutations. 
45. Fifteen cyclists enter a race. How many ways can the cyclists finish first, 
second, and third? 


46. Six different letters are to be put in six envelopes of different colors in a 
manner that each envelope gets only one letter. In how many ways can this 
be done? 


47. A literary magazine editor must choose 4 short stories for this month’s 
issue from 17 submissions. In how many ways can the editor choose this 
month’s stories? 


48. A basketball coach has to choose 5 players from a list of 12 players. In how 
many ways can the coach choose the 5 players? 


In Exercises 49-53, use counting principles to find the probability. 


49. In a game of cards you are dealt a hand of six cards at random from a 
standard deck of 52 cards. You are declared winner if you are being dealt 
an ace, two kings and three queens. Find the probability of you winning the 
game. 


50. A code consists of two distinct letters followed by three digits. The last digit 
cannot be 0 or 1. What is the probability of guessing the security code on the 
first try? 


51. A batch of 100 mobile phones contains 5 smartphones. What is the 
probability that a sample of three phones will have 
(a) no smartphones? 
(b) all smartphones? 
(c) at least one smartphone? 
(d) at least one non-smartphone? 
52. A batch of 200 laptops contains three defective laptops. You choose three 
laptops at random. What is the probability that you have 
(a) no defective laptops? 
(b) all three defective laptops? 
(c) two defective laptops? 
(d) at least one defective laptop? 
53. A panel consists of three female and seven male experts. Three experts are 


chosen at random from this panel to serve in a selection committee. What is 
the probability of choosing 


(a) three men? 
(b) all women? 
(c) two men and one woman? 
(d) one man and two women? 


206 CHAPTER 3. Probability 


3 Chapter Quiz 


Take this quiz as you would take a quiz in class. After you are done, check your 
work against the answers given in the back of the book. 


1. 


2. 


a 
4. 


6. 


7. 


The access code for a warehouse’s security system consists of six digits. The 
first digit cannot be 0 and the last digit must be even. How many access codes 
are possible? 


The table shows the numbers (in thousands) of earned degrees by level in two 
different fields, conferred in the United States in a recent year. (Source: U.S. 
National Center for Education Statistics) 


Field 
Natural sciences/ | Computer science/ 
A : y Total 
mathematics engineering 
Bachelor’s 154.9 164.3 319.2 
Bis Master’s 28.2 71.9 100.1 
0 

degree Doctoral 16.0 12.1 28.1 
Total 199.1 248.3 447.4 


A person who earned a degree in the year is randomly selected. Find the 
probability of selecting someone who 


(a) earned a bachelor’s degree. 

(b) earned a bachelor’s degree, given that the degree is in computer science/ 
engineering. 

(c) earned a bachelor’s degree, given that the degree is not in computer 
science/engineering. 

(d) earned a bachelor’s degree or a master’s degree. 

(e) earned a doctorate, given that the degree is in computer science/engineering. 

(f) earned a master’s degree or the degree is in natural sciences/mathematics. 

(g) earned a bachelor’s degree and the degree is in natural sciences/mathematics. 

(h) earned a degree in computer science/engineering, given that the person 
earned a bachelor’s degree. 

Which event(s) in Exercise 2 can be considered unusual? Explain. 

Determine whether the events are mutually exclusive. Then determine 

whether the events are independent or dependent. Explain your reasoning. 

Event A: A bowler having the highest game in a 40-game tournament 

Event B: Losing the bowling tournament 


From a pool of 30 candidates, the offices of president, vice president, 
secretary, and treasurer will be filled. In how many different ways can the 
offices be filled? 


A shipment of 250 netbooks contains 3 defective units. Determine how many 
ways a vending company can buy three of these units and receive 


(a) nodefective units. (b) alldefective units. (c) at least one good unit. 


In Exercise 6, find the probability of the vending company receiving 


(a) no defective units. (b) all defective units. (c) at least one good unit. 


< Chapter Test 


Chapter Test 207 


Take this test as you would take a test in class. 


1 


Sixty-five runners compete in a 10k race. Your school has 12 runners in the 
race. What is the probability that three runners from your school place first, 
second, and third? 

A security code consists of a person’s first and last initials and four digits. 

(a) What is the probability of guessing a person’s code on the first try? 

(b) What is the probability of not guessing a person’s code on the first try? 


c) You know a person’s first name and that the last digit is odd. What is the 
P g 
probability of guessing this person’s code on the first try? 


(d) Are the statements in parts (a)-(c) examples of classical probability, 
empirical probability, or subjective probability? Explain your reasoning. 

Determine whether the events are mutually exclusive. Explain your reasoning. 

Event A: Randomly select a student born on the 30th of a month 

Event B: Randomly select a student with a birthday in February 


The table shows the sixth, seventh, and eighth grade student enrollment levels 
(in thousands) in Minnesota and Ohio schools in a recent year. (Source: U.S. 
National Center for Education Statistics) 


Sixth grade Seventh grade Eighth grade ‘Total 


Minnesota 61.8 63.6 62.9 188.3 
Ohio 130.8 134.3 135.0 400.1 
Total 192.6 197.9 197.9 588.4 


A student in one of the indicated grades and states is randomly selected. Find 
the probability of selecting a student who 


(a) is in sixth grade 

(b) is in sixth or seventh grade 

(c) is in eighth grade, given that the student is enrolled in Minnesota 
(d) is enrolled in Ohio, given that the student is in seventh grade 

(e) is in seventh grade or is enrolled in Minnesota 

(f) is in sixth grade and is enrolled in Ohio 


Which event(s) in Exercise 4 can be considered unusual? Explain your 
reasoning. 


A person is selected at random from the sample in Exercise 4. Are the events 
“the student is in sixth grade” and “the student is enrolled in Minnesota” 
independent or dependent? Explain your reasoning. 


There are 16 students giving final presentations in your history course. 


(a) Three students present per day. How many presentation orders are 
possible for the first day? 


(b) Presentation subjects are based on the units of the course. Unit B is 
covered by three students, Unit C is covered by five students, and Units A 
and D are each covered by four students. How many presentation orders 
are possible when presentations on the same unit are indistinguishable 
from each other? 


Putting it all together 


REAL DECISIONS 


You work in the security department of a bank’s website. To 
access their accounts, customers of the bank must create an ACCOUNT REGISTRATION FORM 
8-digit password. It is your job to determine the password Z Z 


requirements for these accounts. Security guidelines state Select your username: 
that for the website to be secure, the probability that an 


8-digit password is guessed on one try must be less than 60" Create an 8-digit password: 


assuming all passwords are equally likely. 

Your job is to use the probability techniques you have 
learned in this chapter to decide what requirements a 
customer must meet when choosing a password, including 
what sets of characters are allowed, so that the website is 
secure according to the security guidelines. 


EXERCISES 


1. How Would You Do It? 


(a) How would you investigate the question of what password 
requirements you should set to meet the security guidelines? 


(b) What statistical methods taught in this chapter would you use? 


Verify Password: 


2. Answering the Question 


(a) What password requirements would you set? What characters 
would be allowed? 


(b) Show that the probability that a password is guessed on one try Most Popular 5-Digit PINs 


is less than —g, when the requirements in part (a) are used and 
Rank PIN Percent 


all passwords are equally likely. | 


1 12345 | 22.80% 
3. Additional Security > | 111d |) 4.48% 
For additional security, each customer creates a 5-digit PIN (personal 3 | 55555 | 1.77% 
identification number). The table on the right shows the 10 most 4 | cocoa 126% 
commonly chosen 5-digit PINs. From the table, you can see that eia| 
more than a third of all 5-digit PINs could be guessed by trying 5 | 54321 1.20% 
these 10 numbers. To discourage customers from using predictable 6 | 13579 | 1.11% | 
PINs, you consider prohibiting PINs that use the same digit more 7 | 77777 | 0.62% | 
Maen Once 8 | 22222 0.45% 
(a) How would this requirement affect the number of possible 9 12321 0.41% 
5-digit PINs? | 
Ms a | 10 99999 0.40% | 
(b) Would you decide to prohibit PINs that use the same digit more 7 
than once? Explain. (Source: Datagenetics.com) 


208 CHAPTER 3. Probability 


TECHNOLOGY 


Simulation: Composing Mozart Variations with Dice 


Wolfgang Mozart (1756-1791) composed a wide 
variety of musical pieces. In his Musical Dice 
Game, he wrote a minuet with an almost endless 
number of variations. Each minuet has 16 bars. 
In the eighth and sixteenth bars, the player has a 
choice of two musical phrases. In each of the other 
14 bars, the player has a choice of 11 phrases. 


To create a minuet, Mozart suggested that 
the player toss 2 six-sided dice 16 times. For the 
eighth and sixteenth bars, choose Option 1 when 
the dice total is odd and Option 2 when it is even. 
For each of the other 14 bars, subtract 1 from the 
dice total. The minuet shown is the result of the 
following sequence of numbers. 


5 7 1 6 4 10 5 1 


TI-84 PLUS 


i 2 3 4 
~ fe Pd ~ =f 2 
= “Ht = 
5/11 7/11 1/11 6/11 
5 6 7 8 
h h h ef —pte 
is ——e a 
4/11 10/11 5/11 Pin 
9 10 11 12 
2 a — i ~_-* 
po = f 5 i jz == + | 
6/11 6/11 2/11 ay 


.- How many phrases did Mozart write to create the 
Musical Dice Game minuet? Explain. 


. How many possible variations are there in Mozart’s 

Musical Dice Game minuet? Explain. 

. Use technology to randomly select a number from 1 

to 11. 

(a) What is the theoretical probability of each number 
from 1 to 11 occurring? 

(b) Use this procedure to select 100 integers from 1 to 


11. Tally your results and compare them with the 
probabilities in part (a). 


Extended solutions are given in the technology manuals that accompany this text. 
Technical instruction is provided for Minitab, Excel, and the TI-84 Plus. 


. What is the probability of randomly selecting option 


6, 7, or 8 for the first bar? For all 14 bars? Find each 
probability using (a) theoretical probability and (b) 
the results of Exercise 3(b). 


. Use technology to randomly select two numbers from 


1 to 6. Find the sum and subtract 1 to obtain a total. 

(a) What is the theoretical probability of each total 
from 1 to 11? 

(b) Use this procedure to select 100 totals from 1 to 
11. Tally your results and compare them with the 
probabilities in part (a). 


. Repeat Exercise 4 using the results of Exercise 5. 


Technology 209 


210 


Uiscrete Probability 
Uistributions 


‘| 
NWI 
HINNMIININ) 


" / 


The National Climatic Data Center (NCDC) is the world’s largest active archive of weather 
data. NCDC archives weather data from the Coast Guard, Federal Aviation Administration, 
Military Services, the National Weather Service, and voluntary observers. 


41 


42 
MNOMal VISE 
Activity 

Case Study 


43 
More UIs 


listribut 
Uses and Abuses 


Real Statistics—Real Decisions 
Technology 


J Where You ve Been 


In Chapters 1 through 3, you learned how to collect and 
describe data and how to find the probability of an event. 
These skills are used in many different types of careers. For 
instance, data about climatic conditions are used to analyze 
and forecast the weather throughout the world. On a typical 
day, meteorologists use data from aircraft, National Weather 
Service cooperative observers, radar, remote sensing 
systems, satellites, ships, weather balloons, wind profilers, 


Ly Where You re Going 


and a variety of other data-collection devices to forecast the 
weather. Even with this much data, meteorologists cannot 
forecast the weather with certainty. Instead, they assign 
probabilities to certain weather conditions. For instance, a 
meteorologist might determine that there is a 40% chance 
of rain (based on the relative frequency of rain under 
similar weather conditions). 


In Chapter 4, you will learn how to create and use 
probability distributions. Knowing the shape, center, 
and variability of a probability distribution enables 
you to make decisions in inferential statistics. For 
example, you are a meteorologist working on a 
three-day forecast. Assuming that having rain on one 
day is independent of having rain on another day, 
you have determined that there is a 40% probability 
of rain (and a 60% probability of no rain) on each 
of the three days. What is the probability that it will 
rain on 0, 1, 2, or 3 of the days? To answer this, you 
can create a probability distribution for the possible 
outcomes. 


Day1l Day2 Day3 Probability Days of Rain 

0.6 

0.6 — P(e, %, H) = 0.216 0 
* 0.4 

0.6 L_—@ = P(e, a 6) = 0.144 1 
2 0.6 

0.4 ~——# = PH, 6,8) = 0.144 1 
—6— 04 

6 P(E, 6,6) = 0.096 2 
0.6 

0.6 -——%# PO,% #) = 0.144 1 
-—ir—_ «O04 

0.4 L__@  P(6,%, 6) = 0.096 2 
6 0.6 

0.4 ;—M% — P(6, 6 HF) = 0.096 2 
6 0.4 

—6 P(6,6, 6) = 0.064 3 


Using the Addition Rule with the probabilities in the tree diagram, you can determine the probabilities of having rain on 
various numbers of days. You can then use this information to construct and graph a probability distribution. 


Probability Distribution 


| Days of rain Tally Probability | 

0 } 1 | o26 | 
1 3 0.432 | 
2 3 0.288 | 
3 1 0.064 


Number of Days of Rain 
P(x) 
A 

0.45 
0.40 --— 
0:35 =— 
0.30 --— 
0.25 =-— 
0.20 =f 
0.15 + 
0.10 -— 
0.05 =—| 


Probability 


Days of rain 


211 


212 


CHAPTER 4 _ Discrete Probability Distributions 


What You Should Learn 


» How to distinguish between 
discrete random variables and 
continuous random variables 


» How to construct and graph a 
discrete probability distribution 
and how to determine whether 
a distribution is a probability 
distribution 


~ How to find the mean, variance, 


and standard deviation of a 
discrete probability distribution 
~ How to find the expected 

value of a discrete probability 
distribution 


Random Variables m Discrete Probability Distributions m Mean, Variance, 
and Standard Deviation m= Expected Value 


Random Variables 


The outcome of a probability experiment is often a count or a measure. When 
this occurs, the outcome is called a random variable. 


DEFINITION 


A random variable x represents a value associated with each outcome of a 


probability experiment. 


The word random indicates that x is determined by chance. There are two 
types of random variables: discrete and continuous. 


DEFINITION 


A random variable is discrete when it has a finite or countable number of 


possible outcomes that can be listed. 


A random variable is continuous when it has an uncountable number of 
possible outcomes, represented by an interval on a number line. 


In most applications, discrete random variables represent counted data, 
while continuous random variables represent measured data. For instance, 
consider the following example. You conduct a study of the number of calls a 
telemarketing firm makes in one day. The possible values of the random variable 
x are 0, 1, 2,3, 4, and so on. Because the set of possible outcomes {0, 1, 2,3,...} 
can be listed, x is a discrete random variable. You can represent its values as 
points on a number line. 


Number of Calls (Discrete) 


a a eS Sa Se 
0 41 2 3 4 5 6 7 8 9 10 


x can be any whole number: 0, 1, 2, 3,... 
A different way to conduct the study would be to measure the time (in hours) 
the telemarketing firm spends making calls in one day. Because the time spent 
making calls can be any number from 0 to 24 (including fractions and decimals), 


x is a continuous random variable. You can represent its values with an interval 
on a number line. 


Hours Spent on Calls (Continuous) 


0 3 6 9 12 15 1 2 24 
xX can be any value between 0 and 24. 
When a random variable is discrete, you can list the possible values the 


variable can assume. However, it is impossible to list all values for a continuous 
random variable. 


* Study Tip 

Values of variables such 

as volume, age, height, 

and weight are sometimes 

rounded to the nearest 

whole number. These 

values represent measured 
data, however, so they are continuous 
random variables. 


SECTION 4.1 Probability Distributions 213 


Discrete Variables and Continuous Variables 
Determine whether each random variable x is discrete or continuous. Explain 
your reasoning. 


1. Let x represent the number of Fortune 500 companies that lost money in 
the previous year. 


2. Let x represent the volume of gasoline in a 21-gallon tank. 


SOLUTION 
1. The number of companies that lost money in the previous year can be 
counted. The set of possible outcomes is 


{0, 1,2, 3,..., 500}. 


So, x is a discrete random variable. 


2. The amount of gasoline in the tank can be any volume between 0 gallons 
and 21 gallons. So, x is a continuous random variable. 


TRY IT YOURSELF 1 


Determine whether each random variable x is discrete or continuous. Explain 
your reasoning. 


1. Let x represent the speed of a rocket. 
2. Let x represent the number of calves born on a farm in one year. 


3. Let x represent the number of days of rain for the next three days (see 
page 211). 
Answer: Page A34 


It is important that you can distinguish between discrete and continuous 
random variables because different statistical techniques are used to analyze 
each. The remainder of this chapter focuses on discrete random variables and 
their probability distributions. Your study of continuous probability distributions 
will begin in Chapter 5. 


Discrete Probability Distributions 


Each value of a discrete random variable can be assigned a probability. By listing 
each value of the random variable with its corresponding probability, you are 
forming a discrete probability distribution. 


DEFINITION 


A discrete probability distribution lists each possible value the random 
variable can assume, together with its probability. A discrete probability 
distribution must satisfy these conditions. 


In Words In Symbols 


1. The probability of each value of the discrete 0= P(x) =1 
random variable is between 0 and 1, inclusive. 


2. The sum of all the probabilities is 1. 


Because probabilities represent relative frequencies, a discrete probability 
distribution can be graphed with a relative frequency histogram. 


214 CHAPTER 4 _ Discrete Probability Distributions 


Frequency Distribution 


Score, x 


1 


nan WN 


Passive-A ggressive Traits 


P(x) 

\ 
0.30+- 
0.25-- 
0.20- 


0.15 5 


Probability 


0.105 


0.05 5 


Frequency, f 
24 
33 
42 
30 
21 


Frequency Distribution 


Sales per 
day, x 


0 


YANN FP WN 


Number of 
days, f 
16 


GUIDELINES 


Constructing a Discrete Probability Distribution 
Let x be a discrete random variable with possible outcomes x4, X92, .. ., Xp. 


. Make a frequency distribution for the possible outcomes. 
. Find the sum of the frequencies. 


. Find the probability of each possible outcome by dividing its frequency 
by the sum of the frequencies. 


. Check that each probability is between 0 and 1, inclusive, and that the 
sum of all the probabilities is 1. 


Constructing and Graphing a Discrete Probability Distribution 
An industrial psychologist administered a personality inventory test for 
passive-aggressive traits to 150 employees. Each individual was given a whole 
number score from 1 to 5, where 1 is extremely passive and 5 is extremely 
aggressive. A score of 3 indicated neither trait. The results are shown at the 
left. Construct a probability distribution for the random variable x. Then graph 
the distribution using a histogram. 


SOLUTION 


Divide the frequency of each score by the total number of individuals in the 
study to find the probability for each value of the random variable. 


24 33 42 


0 21 
P(4) = ~ =O PG)= 5.04 


The discrete probability distribution is shown in the table below. 


x 1 2 3 4 5 
P(x) 0.16 | 0.22) 0.28 0.20 0.14 


Note that the probability of each value of x is between 0 and 1, and the sum 
of the probabilities is 1. So, the distribution is a probability distribution. The 
graph of the distribution is shown in the histogram at the left. Because the 
width of each bar is one, the area of each bar is equal to the probability of a 
particular outcome. Also, the probability of an event corresponds to the sum of 
the areas of the outcomes included in the event. For instance, the probability 
of the event “having a score of 2 or 3” is equal to the sum of the areas of the 
second and third bars, 


(1) (0.22) + (1)(0.28) = 0.22 + 0.28 = 0.50. 
Interpretation You can see that the distribution is approximately symmetric. 


TRY IT YOURSELF 2 


A company tracks the number of sales new employees make each day during a 
100-day probationary period. The results for one new employee are shown at 
the left. Construct a probability distribution for the random variable x. Then 
graph the distribution using a histogram. 

Answer: Page A34 


SZ 
BR 


eee) Picturing 
the World 


A study was conducted to 
determine how many credit cards 
people have. The results are shown 
in the histogram. (Adapted from 
American Association of Retired Persons) 


How Many Credit 
Cards Do You Have? 
P(x) 

A 
0.30-+ 
0.25 + 
0.20-+ 
0.15 
0.10-+ 
0.05 -+ 


Probability 


1 
esl Jed LI 
0123 45 6 


Number 


Estimate the probability that a 
randomly selected person has two 
or three credit cards. 


SECTION 4.1 Probability Distributions 215 


Verifying a Probability Distribution 
Verify that the distribution for the three-day forecast (see page 211) and the 
number of days of rain is a probability distribution. 


Days of rain, x 0 1 2 3 
Probability, P(x) 0.216 0.432--0.288 0.064 


SOLUTION 


If the distribution is a probability distribution, then (1) each probability is 
between 0 and 1, inclusive, and (2) the sum of all the probabilities equals 1. 


1. Each probability is between 0 and 1. 
2. >P(x) = 0.216 + 0.432 + 0.288 + 0.064 
=1, 


Interpretation Because both conditions are met, the distribution is a 
probability distribution. 


TRY IT YOURSELF 3 


Verify that the distribution you constructed in Try It Yourself 2 is a probability 
distribution. 


Answer: Page A34 


Identifying Probability Distributions 


Determine whether each distribution is a probability distribution. Explain your 
reasoning. 


- 2% 5 6 7 8 2. 1 | 2 | 3) 4 
P(x) | 0.28 | 0.21 | 0.43 | 0.15 oo Bae 
SOLUTION 


1. Each probability is between 0 and 1, but the sum of all the probabilities 
is 1.07, which is greater than 1. The sum of all the probabilities in a 
probability distribution always equals 1. So, this distribution is not a 
probability distribution. 


2. The sum of all the probabilities is equal to 1, but P(3) and P(4) are not 
between 0 and 1. Probabilities can never be negative or greater than 1. So, 
this distribution is not a probability distribution. 


TRY IT YOURSELF 4 


Determine whether each distribution is a probability distribution. Explain your 
reasoning. 


- x 5|/6|7] 8 2. 1 2 3 4 
Bem i|cia\l uw P(x) 0.09 0.36 | 0.49 0.10 


Answer: Page A34 


216 CHAPTER 4. Discrete Probability Distributions 


Study Tip 


Notice that the mean in 
Example 5 is rounded to 
one decimal place. This 
rounding was done 
because the mean of a 

= “ probability distribution 
should be rounded to one more 
decimal place than was used for the 
random variable x. This round-off 
rule is also used for the variance and 
standard deviation of a probability 
distribution. 


Mean, Variance, and Standard Deviation 


You can measure the center of a probability distribution with its mean and 
measure the variability with its variance and standard deviation. The mean of a 
discrete random variable is defined as follows. 


Mean of a Discrete Random Variable 


The mean of a discrete random variable is given by 


b= YxP(x). 


Each value of x is multiplied by its corresponding probability and the 
products are added. 


The mean of a random variable represents the “theoretical average” of 
a probability experiment and sometimes is not a possible outcome. If the 
experiment were performed many thousands of times, then the mean of all the 
outcomes would be close to the mean of the random variable. 


Finding the Mean of a Probability Distribution 


The probability distribution for the personality inventory test for passive- 
aggressive traits discussed in Example 2 is shown below. Find the mean score. 


Score, x 1 2 3 4 5 


Probability, P(x) 0.16 | 0.22 | 0.28 0.20 0.14 


SOLUTION 
Use a table to organize your work, as shown below. 


a P(x) xP(x) 

1 0.16 1(0.16) = 0.16 
2 0.22 2(0.22) = 0.44 
3 0.28 3(0.28) = 0.84 
4 0.20 4(0.20) = 0.80 
5 0.14 5(0.14) = 0.70 


P(x) =1 | YxP(x) = 2.94 = 2.9 | <«—— Mean 


From the table, you can see that the mean score is 


pb = 2.94 = 2.9. Round to one decimal place. 


Note that the mean is rounded to one more decimal place than the possible 
values of the random variable x. 


Interpretation Recall that a score of 3 represents an individual who exhibits 
neither passive nor aggressive traits and the mean is slightly less than 3. So, the 
mean personality trait is neither extremely passive nor extremely aggressive, 
but is slightly closer to passive. 


TRY IT YOURSELF 5 


Find the mean of the probability distribution you constructed in Try It 
Yourself 2. What can you conclude? 
Answer: Page A34 


Study Tip 


An alternative formula 
for the variance of a 


probability distribution is 
= [Xx?P(x) ] — pe. 


Tech Tip 


You can use technology 
such as Minitab, Excel, 
StatCrunch, or the 
TI-84 Plus to find the 
mean and standard 
deviation of a discrete 
random variable. For instance, 

to find the mean and standard 
deviation of the discrete random 
variable in Example 6 on a TI-84 
Plus, enter the possible values of 
the discrete random variable x in L1. 
Next, enter the probabilities P(x) in 
L2.Then, use the 7-Var Stats feature 
with L1 as the list and L2 as the 
frequency list to calculate the mean 
and standard deviation (and other 
statistics), as shown below. 


TI-84 PLUS 


SH= 
ox=1.27 13577266 
Lrn=1 


SECTION 4.1 Probability Distributions 217 


Although the mean of the random variable of a probability distribution 
describes a typical outcome, it gives no information about how the outcomes 
vary. To study the variation of the outcomes, you can use the variance and 
standard deviation of the random variable of a probability distribution. 


Variance and Standard Deviation of a Discrete Random Variable 


The variance of a discrete random variable is 
a = >(% — p)?P(2). 


The standard deviation is 


o = Vo? = V3 (x — w)*P(x). 


Finding the Variance and Standard Deviation 


The probability distribution for the personality inventory test for passive- 
aggressive traits discussed in Example 2 is shown below. Find the variance and 
standard deviation of the probability distribution. 


Score, x 1 2 3 4 5 


Probability, P(x) 0.16 | 0.22 | 0.28 0.20 0.14 


SOLUTION 


To find the variance and standard deviation, note that from Example 5 the 
mean of the distribution before rounding is w = 2.94. (Use this value to avoid 
rounding until the last calculation.) Use a table to organize your work, as 
shown below. 


x| Px) |x-e#| @- 2) (x — m)P(x) 
1 0.16 —1.94 3.7636 0.602176 
2 0.22 —0.94 0.8836 0.194392 
3 0.28 0.06 0.0036 0.001008 
4 0.20 1.06 1.1236 0.224720 
5 0.14 2.06 4.2436 0.594104 
YP(x) =1 S(x — p)?P(x) = 1.6164 ot 


Variance 
So, the variance is 


o” = 1.6164 = 1.6 


and the standard deviation is 
a = Vo? = V 16164 ~ 1.3. 


Interpretation Most of the data values differ from the mean by no more 
than 1.3. 


TRY IT YOURSELF 6 


Find the variance and standard deviation of the probability distribution 
constructed in Try It Yourself 2. 
Answer: Page A34 


218 


CHAPTER 4 _ Discrete Probability Distributions 


Expected Value 


The mean of a random variable represents what you would expect to happen 
over thousands of trials. It is also called the expected value. 


DEFINITION 


The expected value of a discrete random variable is equal to the mean of the 


random variable. 
Expected Value = E(x) = w = XxP(x) 


In most applications, an expected value of 0 has a practical interpretation. 
For instance, in games of chance, an expected value of 0 implies that a game is 
fair (an unlikely occurrence). In a profit and loss analysis, an expected value of 0 
represents the break-even point. 

Although probabilities can never be negative, the expected value of a 
random variable can be negative, as shown in the next example. 


Finding an Expected Value 


At araffle, 1500 tickets are sold at $2 each for four prizes of $500, $250, $150, 
and $75. You buy one ticket. Find the expected value and interpret its meaning. 


SOLUTION 


To find the gain for each prize, subtract the price of the ticket from the prize. 
For instance, your gain for the $500 prize is 


$500 — $2 = $498 


and your gain for the $250 prize is 
$250 — $2 = $248. 


Write a probability distribution for the possible gains (or outcomes). Note that 
a gain represented by a negative number is a loss. 


—$2 represents 


Gain, x $498 | $248 $148 | $73 | -$2 a loss of $2 
Probability, P(x) 1300 iso 500 1500 1500 


Then, using the probability distribution, you can find the expected value. 
E(x) = 2xP(x) 
1496 


1 1 1 1 
= $498 -—— + $248 -——. + $148- —_ + $73-——_ + (-$2) -——— 
$49 $ $ $73 (-$2) 1500 


1500 1500 1500 1500 
= —$1,35 


Interpretation Because the expected value is negative, you can expect to lose 
an average of $1.35 for each ticket you buy. 


TRY IT YOURSELF 7 


At araffle, 2000 tickets are sold at $5 each for five prizes of $2000, $1000, $500, 
$250, and $100. You buy one ticket. Find the expected value and interpret its 


meaning. 
Answer: Page A34 


SECTION 4.1 Probability Distributions 219 


4.1 EXERCISES 


For Extra Help: MyLab Statistics 


Building Basic Skills and Vocabulary 


1. What is a random variable? Give an example of a discrete random variable 
and a continuous random variable. Justify your answer. 


2. What is a discrete probability distribution? What are the two conditions that 
a discrete probability distribution must satisfy? 


3. Is the expected value of the probability distribution of a random variable 
always one of the possible values of x? Explain. 


4. What does the mean of a probability distribution represent? 
True or False? Jn Exercises 5—8, determine whether the statement is true or 
false. If it is false, rewrite it as a true statement. 


5. In most applications, continuous random variables represent counted data, 
while discrete random variables represent measured data. 


6. For a random variable x, the word random indicates that the value of x is 
determined by chance. 


7. The mean of the random variable of a probability distribution describes how 
the outcomes vary. 


8. The expected value of a random variable can never be negative. 
Graphical Analysis In Exercises 9-12, determine whether the graph on 


the number line represents a discrete random variable or a continuous random 
variable. Explain your reasoning. 


9, The attendance at concerts for 10. The length of time student-athletes 
a rock group practice each week 
~ | © @@e © ee > 
40,000 45,000 50,000 0 4 8 12 16 20 
Attendance Time (in hours) 


11. The distance a baseball travels 12. The speeds (in kilometers/hours) 


after being hit of top 5 soccer hits (Source: Goal) 
- soteciomesce ‘ 
: 0 50 100 150 200 250 
0 100 200 300 400 500 600 Speeds 


Distance (in feet) 


Using and Interpreting Concepts 


Discrete Variables and Continuous Variables Jn Exercises 13-18, 
determine whether the random variable x is discrete or continuous. Explain. 


13. Let x represent the number of cars in a university parking lot. 

14. Let x represent the length of time it takes to complete an exam. 

15. Let x represent the number of times a book is issued from the library. 

16. Let x represent the number of tornadoes in the month of May in Oklahoma. 
17. Let x represent the weight of a student’s school bag. 


18. Let x represent the snowfall (in inches) in Nome, Alaska, last winter. 


220 


CHAPTER 4 _ Discrete Probability Distributions 


Constructing and Graphing Discrete Probability Distributions /n 
Exercises 19 and 20, (a) construct a probability distribution, and (b) graph the 
probability distribution using a histogram and describe its shape. 


19. Cars The number of Sedan cars per household in a small town 


Cars 0 1 2 3 
Households 598 | 462 | 381 = 259 


20. Movie Times The number of movies watched by the residents of a locality 
per week 


Number of movies 0 1 2, 3 4 5 
Residents 25 | 35 | 40 | 64) 88 | 148 


21. Finding probabilities Use the probability distribution you made in Exercise 
19 to find the probability of randomly selecting a household that has (a) one 
or two cars, (b) less than two cars, (c) between one and three cars, inclusive, 
and (d) at least two cars. 


22. Finding probabilities Use the probability distribution you made in Exercise 
20 to find the probability of randomly selecting a resident of the locality who 
watches (a) two or three movies, (b) more than three movies, (c) between 
one and four movies, inclusive, (d) between two and five movies, inclusive, 
and (d) at most three movies. 


23. Unusual Events In Exercise 19, would it be unusual for a household to 
have three Sedan cars? Explain your reasoning. 


24. Unusual Events In Exercise 20, would it be unusual for a resident to not 
watch any movie in a week at all? Explain your reasoning. 

Determining a Missing Probability Jn Exercises 25 and 26, determine the 

missing probability for the probability distribution. 


25. 
ae 0 1 2 3 4 


P(x) | 0.09 | 0.15 ? 0.26 0.17 


26. 
ae 0 1 2 3 4 > 6 


P(x) 0.09 0.11 0.16 | 0.25 2 0.08 | 0.02 


Identifying Probability Distributions In Exercises 27 and 28, determine 
whether the distribution is a probability distribution. If it is not a probability 
distribution, explain why. 


27. 
x 0 1 2 3 + 


P(x) 0.40 0.21) 011 0.15 | 0.18 


2 3 1 3 
P(x) 5 100 25 20 5 10 


SECTION 4.1 Probability Distributions 221 


Finding the Mean, Variance, and Standard Deviation Jn Exercises 
29-34, (a) find the mean, variance, and standard deviation of the probability 
distribution, and (b) interpret the results. 


29. Books The number of books per shelf in a library 


Books 0 1 2 3 4 5 
Probability 0.002 0.018 0.054 | 0.199 0.715 | 0.012 


30. Cricket The number of countries who played Women’s Cricket World Cup 
from 1973 through 2013 (Source: ICC Women’s World Cup) 


| Number of countries 4 5 7 8 10 11 
Probability 0.1 0.2 0.1 0.4 0.1 0.1 


31. LED Lamps The number of defects per 1000 LED lamps inspected 


_ Defects 0 1 2 3 4 5 
Probability | 0.440 0.285 | 0.163 | 0.076 | 0.019 0.017 


32. A Grade The number of A grades received in different subjects per student 


Grades 0 1 2 3 4 5 6 7 
Probability 0.008 0.099 | 0.185 | 0.161 0.289 | 0.121 | 0.081 | 0.056 


33. Hurricanes The histogram shows the distribution of hurricanes that have 
hit the U.S. mainland from 1851 through 2015 by Saffir-Simpson category, 
where 1 is the weakest level and 5 is the strongest level. (Source: National 
Oceanic & Atmospheric Administration) 


U.S. Mainland Hurricanes Reviewer Ratings 
P(x) P(x) 
A A 
0.45 +- 0.411 0.45 +- 
0.40 -+ 0.40 4 
0.35 5 0.35 4 
© 0304 £ 030-4 
5 0.25 4 5 0.25 4 
020+ 2 0.204 
mH 015-4 a 0.15 4 
0.10 0.10 4 
0.05 + 0.05 + 
x 
1 2 3 4 5 
Category Rating 
FIGURE FOR EXERCISE 33 FIGURE FOR EXERCISE 34 


34. Reviewer Ratings The histogram shows the reviewer ratings on a scale 
from 1 (lowest) to 5 (highest) of a recently published book. 


35. Writing The expected value of an accountant’s profit and loss analysis is 0. 
Explain what this means. 


” 


36. Writing In a game of chance, what is the relationship between a “fair bet 
and its expected value? Explain. 


222 


CHAPTER 4_ Discrete Probability Distributions 


Finding an Expected Value = In Exercises 37 and 38, find the expected value 
E(x) to the player for one play of the game. If x is the gain to a player in a game 
of chance, then E(x) is usually negative. This value gives the average amount per 
game the player can expect to lose. 


37. In American roulette, the wheel has the 38 numbers, 00, 0, 1, 2,..., 34, 35, 
and 36, marked on equally spaced slots. If a player bets $1 on a number 
and wins, then the player keeps the dollar and receives an additional $35. 
Otherwise, the dollar is lost. 


38. A high school basketball team is selling $10 raffle tickets as part of a 
fund-raising program. The first prize is a trip to the Bahamas valued at 
$5460, and the second prize is a weekend ski package valued at $496. The 
remaining 18 prizes are $100 gas cards. The number of tickets sold is 3500. 


Extending Concepts 


Linear Transformation of a Random Variable Jn Exercises 39 and 
40, use this information about linear transformations. For a random variable x, 
a new random variable y can be created by applying a linear transformation 
y = a+ bx, where a and b are constants. If the random variable x has mean p, 
and standard deviation o,,, then the mean, variance, and standard deviation of y are 


given by the formulas wy, = a + by, a, = b’o?, and C= |b| oy. 


39. The mean annual salary of employees at an office is originally $46,000. Each 
employee receives an annual bonus of $600 and a 3% raise (based on salary). 
What is the new mean annual salary (including the bonus and raise)? 


40. The mean annual salary of employees at an office is originally $44,000 with 
a variance of 18,000,000. Each employee receives an annual bonus of $1000 
and a 3.5% raise (based on salary). What is the standard deviation of the new 
salaries? 


Independent and Dependent Random Variables Two random 
variables x and y are independent when the value of x does not affect the value 
of y. When the variables are not independent, they are dependent. A new random 
variable can be formed by finding the sum or difference of random variables. If 
a random variable x has mean p, and a random variable y has mean jy, then the 
means of the sum and difference of the variables are given by the formulas below. 


Mxty = My + My Mx-y = My ~ My 


If random variables are independent, then the variance and standard deviation 
of the sum or difference of the random variables can be found. So, if a random 
variable x has variance o% and a random variable y has variance o%, then the 
variances of the sum and difference of the variables are given by the formulas 
below. Note that the variance of the difference is the sum of the variances. 


Oy = o a a; oy-5 = re + a; 
In Exercises 41 and 42, the distribution of SAT mathematics scores for 
college-bound male seniors in 2016 has a mean of 524 and a standard deviation of 
126. The distribution of SAT mathematics scores for college-bound female seniors 
in 2016 has a mean of 494 and a standard deviation of 116. One male and one 
female are randomly selected. Assume their scores are independent. (Source: The 
College Board) 


41. What is the average sum of their scores? What is the average difference of 
their scores? 


42. What is the standard deviation of the difference of their scores? 


SECTION 4.2. Binomial Distributions 223 


42 


Binomial Experiments m= Binomial Probability Formula m= Finding 
ee Binomial Probabilities m Graphing Binomial Distributions m Mean, 
~ How to determine whether a Variance, and Standard Deviation 


probability experiment is a 
binomial experiment ; c 
S Houiornnclbivericl Binomial Experiments 
probabilities using the binomial 


eS There are many probability experiments for which the results of each trial can 
probability formula 


; be reduced to two outcomes: success and failure. For instance, when a basketball 
~ How to find binomial player attempts a free throw, he or she either makes the basket or does not. 


probabilities using technology, Probability experiments such as these are called binomial experiments. 
formulas, and a binomial 


probability table DEFINITION 
~ How to construct and graph a 

binomial distribution A binomial experiment is a probability experiment that satisfies these 
» How to find the mean, variance, conditions. 


and standard deviation of a 


Pitnerel jaca bev cesar deca . The experiment has a fixed number of trials, where each trial is 


independent of the other trials. 


. There are only two possible outcomes of interest for each trial. Each 
outcome can be classified as a success (S) or as a failure (F). 


. The probability of a success is the same for each trial. 
. The random variable x counts the number of successful trials. 


Description 
The number of trials 


The probability of success in a single trial 


The probability of failure in a single trial (¢ = 1 — p) 


Trial Outcome SorF? ‘ 
The random variable represents a count of the number of 


10 
Pate successes in n trials: x = 0,1,2,3,...,n. 
i °° F 
bedi 
=, In a binomial experiment, success does not imply something good occurred. 
+ . . . . . 
> +t 5 For instance, in an experiment a survey asks 1012 people about identity theft. 
a A success is a person who was a victim of identity theft. 
6 . . . . 
. Here is an example of a binomial experiment. From a standard deck of 
cy = cards, you pick a card, note whether it is a club or not, and replace the card. You 
: oo . repeat the experiment five times, so n = 5. The outcomes of each trial can be 
4 classified in two categories: S = selecting a club and F = selecting another suit. 
to 6 The probabilities of success and failure are 
4 + F 1 1 3 
=— and =1-—-=-. 
P4 4 4. 4 
5 Ss The random variable x represents the number of clubs selected in the five 
trials. So, the possible values of the random variable are x = 0, 1, 2,3, 4,5. For 


instance, if x = 2, then exactly two of the five cards are clubs and the other three 
are not clubs. An example of an experiment with x = 2 is shown at the left. Note 
that x is a discrete random variable because its possible values can be counted. 


There are two successful outcomes. 
So, xX = 2. 


224 CHAPTER 4. Discrete Probability Distributions 


& 
cx) Picturing 
the World 


A recent survey of 1520 U.S. adults 
was conducted to study the ways 
in which Americans use social 
media. One of the questions from 
the survey and the responses 
(either yes or no) are shown 
below. (Source: Pew Research) 


Survey question: Do you ever use 
the Internet or a mobile app to use 
Facebook? 


Yes 
68% 


Why is this a binomial experiment? 
Identify the probability of success p. 
Identify the probability of failure q. 


Identifying and Understanding Binomial Experiments 


Determine whether each experiment is a binomial experiment. If it is, specify 
the values of n, p, and q, and list the possible values of the random variable x. 
If it is not, explain why. 


1. A certain surgical procedure has an 85% chance of success. A doctor 
performs the procedure on eight patients. The random variable represents 
the number of successful surgeries. 


2. A jar contains five red marbles, nine blue marbles, and six green marbles. 
You randomly select three marbles from the jar, without replacement. The 
random variable represents the number of red marbles. 


SOLUTION 


1. The experiment is a binomial experiment because it satisfies the four 
conditions of a binomial experiment. In the experiment, each surgery 
represents one trial. There are eight surgeries, and each surgery is 
independent of the others. There are only two possible outcomes for each 
surgery—either the surgery is a success or it is a failure. Also, the probability 
of success for each surgery is 0.85. Finally, the random variable x represents 
the number of successful surgeries. 


n=8 Number of trials 
p = 0.85 Probability of success 
q=1- 0.85 

= 0.15 Probability of failure 


Possible values of x 


x = 0,1,2,3,4,5,6,7,8 


2. The experiment is not a binomial experiment because it does not satisfy 
all four conditions of a binomial experiment. In the experiment, each 
marble selection represents one trial, and selecting a red marble is a 
success. When the first marble is selected, the probability of success is 5/20. 
However, because the marble is not replaced, the probability of success for 
subsequent trials is no longer 5/20. So, the trials are not independent, and 
the probability of a success is not the same for each trial. 


TRY IT YOURSELF 1 


Determine whether the experiment is a binomial experiment. If it is, specify 
the values of n, p, and q, and list the possible values of the random variable x. 
If it is not, explain why. 


You take a multiple-choice quiz that consists of 10 questions. 
Each question has four possible answers, only one of which is 
correct. To complete the quiz, you randomly guess the answer 
to each question. The random variable represents the number 
of correct answers. 
Answer: Page A34 


For a random sample collected without replacement, such as in a survey, 


the events are dependent. However, you can treat this situation as a binomial 
experiment by treating the events as independent when the sample size is no 
more than 5% of the population. That is, n = 0.05N. 


In the binomial probability 
formula, ,C, determines 
the number of ways of 
getting x successes in n 
trials, regardless of order. 


nl 


nGx = (n — x)Ix! 


Study Tip 
Recall that n! is read 
“n factorial” and represents 
the product of all integers 
from nto 1. For instance, 
51 = 5-4-3-2-1 
= 120. 


SECTION 4.2. Binomial Distributions 225 


Binomial Probability Formula 


There are several ways to find the probability of x successes in n trials of a 
binomial experiment. One way is to use a tree diagram and the Multiplication 
Rule. Another way is to use the binomial probability formula. 


Binomial Probability Formula 


In a binomial experiment, the probability of exactly x successes in n trials is 


Z n} is 
Ps=,2'¢ "= a= ee pa. 


Note that the number of failures isn — x. 


Finding a Binomial Probability 

Rotator cuff surgery has a 90% chance of success. The surgery is performed on 
three patients. Find the probability of the surgery being successful on exactly 
two patients. (Source: The Orthopedic Center of St. Louis) 


SOLUTION 
Method 1: Draw a tree diagram and use the Multiplication Rule. 
Ist 2nd 3rd Number of 
Surgery Surgery Surgery Outcome Successes Probability 
9.9 9 _ 729 
S| SSF 2 air 
0° 70° 10 ~ 1000 
i | —s SES 2 eee 
a 10°10" 10 ~ 1000 
9 1,.1_ 9 
| — F SFF . 10° 10° 10 ~ 1000 
LW Sree tl 
— ps FSS 2 10 ‘10° 10 ~ 1000 
S|_» FSF 1 iD =m 
F 0°10 10 = 1000 
—sS FFS 1 cee mes 
FE 0°10 10 — 1000 
Te. he = od 
iH—F FFF 0 To i0°i0 = i000 


There are three outcomes that have exactly two successes, and each has a 
probability of in So, the probability of a successful surgery on exactly two 


patients is 3( 735) = 0.243. 

Method 2: Use the binomial probability formula. 

In this binomial experiment, the values of n, p, q, and x are 
= 
10’ 


The probability of exactly two successful surgeries is 


aac Tala) () 7 (oo (a0) 7 3( 0) sear 


TRY IT YOURSELF 2 


A card is selected from a standard deck and replaced. This experiment is repeated 
a total of five times. Find the probability of selecting exactly three clubs. 
Answer: Page A34 


1 
q=-7, and x =2. 


n= 3, p= 10’ 


226 CHAPTER 4 _ Discrete Probability Distributions 


By listing the possible values of x with the corresponding probabilities, you 
can construct a binomial probability distribution. 


Constructing a Binomial Distribution 


In a survey, U.S. adults were asked to identify which social media platforms 
they use. The results are shown in the figure. Six adults who participated in 
the survey are randomly selected and asked whether they use the social media 
platform Facebook. Construct a binomial probability distribution for the 
number of adults who respond yes. (Source: Pew Research) 


SOLUTION 
- Study Tip From the figure, you can see that 68% of adults use the social media platform 
When probabilities are Facebook. So, 
rounded to a fixed p = 0.68 wad q = 0.32. 
number of decimal 
places, the sum of the Because n = 6, the possible values of x are 0, 1, 2, 3, 4, 5, and 6. The 
probabilities may differ probabilities of each value of x are 


slightly from 1. 


P(0) = 6Co(0.68)°(0.32)° = 1(0.68)°(0.32)*® ~ 0.001 
P(1) = 6C,(0.68)1(0.32)> = 6(0.68)1(0.32)*> ~ 0.014 

—s —_ P(2) = 6C2(0.68)*(0.32)* = 15(0.68)7(0.32)* ~ 0.073 

| x P(x) 

i ae P(3) = 6C3(0.68)7(0.32)> = 20(0.68)3(0.32)? = 0.206 
1 0.014 P(4) = 6C,4(0.68)*(0.32)* = 15(0.68)*(0.32)? ~ 0.328 
,] 0.073 ‘ 5 1 
; none P(5) = 6Cs(0.68)°(0.32)! = 6(0.68)°(0.32)! ~ 0.279 
4 0.328 and 
5 0.279 P(6) = 6C6(0.68)°(0.32)° = 1(0.68)°(0.32)° ~ 0.099. 
ae Notice in the table at the left that all the probabilities are between 0 and 1 and 


=P(x) = that the sum of the probabilities is 1. 


TRY IT YOURSELF 3 


Five adults who participated in the survey in Example 3 are randomly selected 
and asked whether they use the social media platform Instagram. Construct a 
binomial distribution for the number of adults who respond yes. 

Answer: Page A35 


SECTION 4.2. Binomial Distributions 227 


Finding Binomial Probabilities 


In Examples 2 and 3, you used the binomial probability formula to find the 
probabilities. A more efficient way to find binomial probabilities is to use 
technology. For instance, you can find binomial probabilities using Minitab, 
Excel, StatCrunch, and the TI-84 Plus. 


Tech Tip 


You can use technology 

such as Minitab, Excel, 

StatCrunch, or the 

TI-84 Plus to find a 

binomial probability. 

For instance, here are 
instructions for finding a binomial 
probability on a TI-84 Plus. From 
the DISTR menu, choose the 
binompdf( feature. Enter the values 
of n, p, and x. Then calculate the 
probability. 


Finding a Binomial Probability Using Technology 


A survey found that 26% of U.S. adults believe there is no difference between 
secured and unsecured wireless networks. (A secured network uses barriers, 
such as firewalls and passwords, to protect information; an unsecured network 
does not.) You randomly select 100 adults. What is the probability that exactly 
35 adults believe there is no difference between secured and unsecured 
networks? Use technology to find the probability. (Source: University of Phoenix) 


SOLUTION 


Minitab, Excel, StatCrunch, and the TI-84 Plus each have features that allow 
you to find binomial probabilities. Try using these technologies. You should 
obtain results similar to these displays. 


Binomial with n = 100 and p = 0.26 Binomial Distribution 
Xx P(X = x) n:100 p:0.26 
35 0.0115763 P(X = 35) = 0.0115763 


TI-84 PLUS 


binompdf(100, .26, 35) 
.0115762984 


Study Tip 


Recall that a probability 
of 0.05 or less is 
considered unusual. 


A B Cc D 
4_| BINOM.DIST(35, 100,0.26,FALSE) 
2 | |0.011576298 


Interpretation From these displays, you can see that the probability that 
exactly 35 adults believe there is no difference between secured and unsecured 
networks is about 0.012. Because 0.012 is less than 0.05, this can be considered 
an unusual event. 


TRY IT YOURSELF 4 


A survey found that 52% of U.S. adults associate professional football with 
negative moral values. You randomly select 150 adults. What is the probability 
that exactly 65 adults associate professional football with negative moral 
values? Use technology to find the probability. (Source: The Harris Poll) 
Answer: Page A35 


228 CHAPTER 4 _ Discrete Probability Distributions 


Study Tip 
The complement of “x is 
at least 2" is “x is less 
than 2.” So, another way 
to find the probability in 
part 3 of Example 5 is 
P(x <2) =1—- P(x=2) 

= 1—-0.137 

= 0.863. 


TI-84 PLUS 


binompdfl4, .17, 2) 
.11945526 

binomedf(4, .17, 1) 
.86339837 


Finding Binomial Probabilities Using Formulas 

A survey found that 17% of U.S. adults say that Google News is a major source 

of news for them. You randomly select four adults and ask them whether 

Google News is a major source of news for them. Find the probability that 

(1) exactly two of them respond yes, (2) at least two of them respond yes, and 

(3) fewer than two of them respond yes. (Source: Ipsos Public Affairs) 

SOLUTION 

1. Using n = 4, p = 0.17, q = 0.83, and x = 2, the probability that exactly 
two adults will respond yes is 

P(2) = 4C,(0.17)7(0.83)? 
= 6(0.17)7(0.83)? 
= 0.119. 

2. To find the probability that at least two adults will respond yes, find the sum 
of P(2), P(3), and P(4). Begin by using the binomial probability formula 
to write an expression for each probability. 

P(2) = 4C(0.17)7(0.83)? = 6(0.17)2(0.83)? 
P(3) = 4C3(0.17)3(0.83)! = 4(0.17)3(0.83)! 
P(4) = 4C.(0.17)7(0.83)° = 1(0.17)*(0.83)° 


So, the probability that at least two will respond yes is 

P(x = 2) = P(2) + P(3) + P(4) 
6(0.17)7(0.83)* + 4(0.17)3(0.83)! + (0.17)*(0.83)° 
= 0.137. 


3. To find the probability that fewer than two adults will respond yes, find the 
sum of P(0) and P(1). Begin by using the binomial probability formula to 
write an expression for each probability. 


P(0) = 4Co(0.17) °(0.83)* = 1(0.17)°(0.83)4 
PL) =4C) (0.17) 700.83)" = 410.17)" (0.83)* 
So, the probability that fewer than two will respond yes is 
P(x <2) = P(O) + P(1) 
= (0.17)°(0.83)* + 4(0.17) !(0.83)? 
= 0.863. 


TRY IT YOURSELF 5 


The survey in Example 5 found that 27% of U.S. adults say that CNN is a 
major source of news for them. You randomly select five adults and ask them 
whether CNN is a major source of news for them. Find the probability that 
(1) exactly two of them respond yes, (2) at least two of them respond yes, and 
(3) fewer than two of them respond yes. (Source: Ipsos Public Affairs) 

Answer: Page A35 


You can use technology to check your answers. For instance, the TI-84 Plus 
screen at the left shows how to check parts 1 and 3 of Example 5. Note that the 
second entry uses the binomial CDF feature. A cumulative distribution function 
(CDF) computes the probability of “x or fewer” successes by adding the areas 
for the given x-value and all those to its left. 


To explore this topic further, 
see Activity 4.2 on page 236. 


SECTION 4.2. Binomial Distributions 229 


Finding binomial probabilities with the binomial probability formula can 
be a tedious process. To make this process easier, you can use a binomial 
probability table. Table 2 in Appendix B lists the binomial probabilities for 
selected values of 1 and p. 


Finding a Binomial Probability Using a Table 


About 10% of workers (ages 16 years and older) in the United States commute 
to their jobs by carpooling. You randomly select eight workers. What is the 
probability that exactly four of them carpool to work? Use a table to find the 
probability. (Source: American Community Survey) 


SOLUTION 


A portion of Table 2 in Appendix B is shown here. Using the distribution for 
n = 8 and p = 0.1, you can find the probability that x = 4, as shown by the 
highlighted areas in the table. 


p 
n x}| .01_ .05 15 .20 .25 30 235 40 45 #50 .55 «60 
2 0} .980 .902 .723 .640 .563 .490 .423 .360 .303 .250 .203 .160 
1| .020 .095 .255 .320 .375 .420 .455 .480 .495 .500 .495 .480 
2| .000 .002 .023 .040 .063 .090 .123 .160 .203 .250 .303 .360 
3 0} .970 .857 614 .512 .422 .343 .275 .216 .166 .125 .091 .064 
1 422 .441 .444 .432 .408 .375 .334 .288 
2 -141 .189 .239 .288 .334 .375 .408 .432 
3 .016 .027 .043 .064 .091 .125 .166 .216 


.272 .168 .100 .058 .032 .017 .008 .004 .002 .001 
385 .336 .267 .198 .137 .090 .055 .031 .016 .008 
.238 .294 .311 .296 .259 .209 .157 .109 .070 .041 
.084 .147 .208 .254 .279 .279 .257 .219 .172 .124 
018 .046 .087 .136 .188 .232 .263 .273 .263 .232 
.000 .000 .000 .003 .009 .023 .047 .081 .124 .172 .219 .257 .279 
000 .000 .000 .000 .001 .004 .010 .022 .041 .070 .109 .157 .209 
000 .000 .000 .000 .000 .000 .001 .003 .008 .016 .031 .055 .090 
.000 .000 .000 .000 .000 .000 .000 .000 .001 .002 .004 .008 .017 


According to the table, the probability 
is 0.005. You can check this result using 


technology. As shown at the right using Probability Density Function 
Minitab, the probability is 


0.0045927. 


Binomial with n = 8 and p = 0.1 


x P(X = x) 
After rounding to three decimal places, 4 0.0045927 
the probability is 0.005, which is the same 
value found using the table. 


Interpretation So, the probability that exactly four of the eight workers 
carpool to work is 0.005. Because 0.005 is less than 0.05, this can be considered 
an unusual event. 


TRY IT YOURSELF 6 


About 5% of workers (ages 16 years and older) in the United States commute 
to their jobs by using public transportation (excluding taxicabs). You randomly 
select six workers. What is the probability that exactly two of them use public 
transportation to get to work? Use a table to find the probability. (Source: 
American Community Survey) Answer: Page A35 


230 


CHAPTER 4 _ Discrete Probability Distributions 


Graphing Binomial Distributions 


In Section 4.1, you learned how to graph discrete probability distributions. 
Because a binomial distribution is a discrete probability distribution, you can use 
the same process. 


Graphing a Binomial Distribution 

Sixty-two percent of cancer survivors are ages 65 years or older. You randomly 
select six cancer survivors and ask them whether they are 65 years of age or 
older. Construct a probability distribution for the random variable x. Then 
graph the distribution. (Source: National Cancer Institute) 


SOLUTION 


To construct the binomial distribution, find the probability for each value of x. 
Using n = 6, p = 0.62, and g = 0.38, you can obtain the following. 


ae 0 1 2 3 4 5 6 
P(x) 0.003 | 0.029 | 0.120 | 0.262 0.320 | 0.209 | 0.057 


Notice in the table that all the probabilities are between 0 and 1 and that the 
sum of the probabilities is 1. You can graph the probability distribution using 
a histogram as shown below. 


Cancer Survivors 65 Years of Age or Older 
P(x) 
rN 


0.35 =- 


0.30 -- 
0.25 => 
0.20 5 


Probability 


0.15 5 
0.10 5 
0.05 5 


Survivors 


Interpretation From the histogram, you can see that it would be unusual for 
none or only one of the survivors to be age 65 years or older because both 
probabilities are less than 0.05. 


TRY IT YOURSELF 7 


A recent study found that 28% of U.S. adults read an ebook in the last 
12 months. You randomly select 4 adults and ask them whether they read 
an ebook in the last 12 months. Construct a probability distribution for the 
random variable x. Then graph the distribution. (Source: Pew Research) 
Answer: Page A35 


Notice in Example 7 that the histogram is skewed left. The graph of a 
binomial distribution with p > 0.5 is skewed left, whereas the graph of a 
binomial distribution with p < 0.5 is skewed right. The graph of a binomial 
distribution with p = 0.5 is symmetric. 


SECTION 4.2. Binomial Distributions 231 


Mean, Variance, and Standard Deviation 


Although you can use the formulas you learned in Section 4.1 for mean, variance, 
and standard deviation of a discrete probability distribution, the properties of a 
binomial distribution enable you to use much simpler formulas. 


Population Parameters of a Binomial Distribution 


Mean: uw = np 


Variance: 0? = npq 


Standard deviation: 0 = Vapq 


EXAMPLE 8 


Finding and Interpreting Mean, Variance, and 
Standard Deviation 


In Pittsburgh, Pennsylvania, about 56% of the days in a year are cloudy. Find 
the mean, variance, and standard deviation for the number of cloudy days 
during the month of June. Interpret the results and determine any unusual 
values. (Source: National Climatic Data Center) 


SOLUTION 
There are 30 days in June. Using n = 30, p = 0.56, and gq = 0.44, you can find 
the mean, variance, and standard deviation as shown below. 
b= np 
= 30-0.56 
= 16.8 Mean 
o” = npq 
= 30:0.56: 0.44 


= 74 Variance 


ao = Vnpq 
= V30-0.56:0.44 


= 2.7 Standard deviation 


Interpretation On average, there are 16.8 cloudy days during the month of 
June. The standard deviation is about 2.7 days. Values that are more than two 
standard deviations from the mean are considered unusual. Because 


16.8 — 2(2.7) = 114 

a June with 11 cloudy days or less would be unusual. Similarly, because 
16.8 + 2(2.7) = 22.2 

a June with 23 cloudy days or more would also be unusual. 


TRY IT YOURSELF 8 


In San Francisco, California, about 44% of the days in a year are clear. Find 
the mean, variance, and standard deviation for the number of clear days 
during the month of May. Interpret the results and determine any unusual 
events. (Source: National Climatic Data Center) 

Answer: Page A35 


232 CHAPTER 4 _ Discrete Probability Distributions 


4.2 EXERCISES 


For Extra Help: MyLab Statistics 


Building Basic Skills and Vocabulary 


1. In a binomial experiment, what does it mean to say that each trial is 
independent of the other trials? 


2. In a binomial experiment with n trials, what does the random variable 
measure? 


Graphical Analysis Jn Exercises 3-5, the histogram represents a binomial 
distribution with 5 trials. Match the histogram with the appropriate probability of 
success p. Explain your reasoning. 


(a) p=0.25 (b) p=0.50  (c) p = 0.75 


3. P(x) 4. P(x) 5. P(x) 
A A A 

0.40 + 0.40 + 

0.30 + 0.30 + 

0.20 + 0.20-+ 

0.10 + 0.10 + 

©. x x 
0123 45 0123 45 012345 


Graphical Analysis Jn Exercises 6-8, the histogram represents a binomial 
distribution with probability of success p. Match the histogram with the appropriate 
number of trials n. Explain your reasoning. What happens as the value of n 
increases and p remains the same? 


(a)n=4 (b)n=8 (c)n=12 


6. PC) 7. Po) 8. PU) 
0.40 4 0.40 Eu 0.40 =- 
0.30 =- 0.30 4- 0.30 =- 
0.20 5- 0.20 =- 0.20 =- 
0.10 4- 0.10 5- 0.10 =- 
x x 
0 2 4 6 8 10 12 0 2 4 6 8 10 12 0 2 4 6 8 10 12 


9. Identify the unusual values of x in each histogram in Exercises 3-5. 


10. Identify the unusual values of x in each histogram in Exercises 6-8. 


Mean, Variance, and Standard Deviation Jn Exercises 11-14, find the 
mean, variance, and standard deviation of the binomial distribution with the given 
values of n and p. 


11. n = 50, p = 0.4 12. n = 84, p = 0.65 
13. n = 124, p = 0.26 14. n = 316, p = 0.82 


Using and Interpreting Concepts 


Identifying and Understanding Binomial Experiments Jn Exercises 
15-18, determine whether the experiment is a binomial experiment. If it is, identify 
a success, specify the values of n, p, and q, and list the possible values of the 
random variable x. If it is not a binomial experiment, explain why. 


15. Video Games A survey found that 36% of frequent gamers play video 
games on their smartphones. Ten frequent gamers are randomly selected. 
The random variable represents the number of frequent gamers who play 
video games on their smartphones. (Source: Entertainment Sofiware Association) 


16. 


17. 


18. 


SECTION 4.2 Binomial Distributions 233 


Lucky Toss A person is required to toss 8 unbiased coins and note down 
the outcome of each. The random variable represents the number of heads. 


Cards You draw four cards, one at a time, from a standard deck. You note 
the suit and replace the card in the deck. The random variable represents the 
number of cards that are diamonds. 


Women Who Are Mothers A survey found that 42% of women ages 18 
to 33 are mothers. Eight women ages 18 to 33 are randomly selected. The 
random variable represents the number of women ages 18 to 33 who are 
mothers. (Source: Pew Research Center) 


Finding Binomial Probabilities In Exercises 19-26, find the indicated 
probabilities. If convenient, use technology or Table 2 in Appendix B. 


19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


Gamers Fifty-two percent of the women in the UK play video games 
regularly. You randomly select seven women in the UK. Find the 
probability that the number of women in the UK who are gamers is 
(a) exactly four, (b) at least five, and (c) less than four. (Source: The Guardian) 


Online Consumers Thirty-three percent of online consumers in Russia 
prefer to shop online using smartphones. You randomly select 12 consumers. 
Find the probability that the number of online consumers who purchase using 
smartphones is (a) exactly six, (b) more than six, (c) at most six. (Source: East- 
West digital News) 


Flu Shots Fifty-six percent of U.S. adults say they intend to get a flu shot. 
You randomly select 10 U.S. adults. Find the probability that the number of 
U.S. adults who intend to get a flu shot is (a) exactly four, (b) at least five, 
and (c) less than seven. (Source: Rasmussen Reports) 


Fast Food Eleven percent of U.S. adults eat fast food four to six times 
per week. You randomly select 12 U.S. adults. Find the probability that 
the number of U.S. adults who eat fast food four to six times per week is 
(a) exactly five, (b) at least two, and (c) less than three. (Source: Statista) 


Consumer Electronics Forty percent of consumers prefer to purchase 
electronics online. You randomly select 11 consumers. Find the probability 
that the number of consumers who prefer to purchase electronics online is 
(a) exactly five, (b) more than five, and (c) at most five. (Source: PwC) 


Grocery Shopping Twenty percent of consumers prefer to purchase 
groceries online. You randomly select 16 consumers. Find the probability 
that the number of consumers who prefer to purchase groceries online is 
(a) exactly one, (b) more than one, and (c) at most one. (Source: PwC) 


Workplace Drug Testing Four percent of the U.S. workforce test positive 
for illicit drugs. You randomly select 14 workers. Find the probability that 
the number of workers who test positive for illicit drugs is (a) exactly two, 
(b) more than two, and (c) between two and five, inclusive. (Source: Quest 
Diagnostics) 


Tax Holiday Forty-four percent of U.S. adults say they are more likely to 
make purchases during a sales tax holiday. You randomly select 15 adults. 
Find the probability that the number of adults who say they are more likely 
to make purchases during a sales tax holiday is (a) exactly seven, (b) more 
than seven, and (c) between seven and eleven, inclusive. (Source: Rasmussen 
Reports) 


234 


CHAPTER 4 _ Discrete Probability Distributions 


Constructing and Graphing Binomial Distributions Jn Exercises 
27-30, (a) construct a binomial distribution, (b) graph the binomial distribution 
using a histogram and describe its shape, and (c) identify any values of the random 
variable x that you would consider unusual. Explain your reasoning. 


27. Working Mothers Forty-nine percent of working mothers do not have 
enough money to cover their health insurance deductibles. You randomly 
select seven working mothers and ask them whether they have enough 
money to cover their health insurance deductibles. The random variable 
represents the number of working mothers who do not have enough money 
to cover their health insurance deductibles. (Source: Aflac) 


28. Workplace Cleanliness Fifty-seven percent of employees judge their peers 
by the cleanliness of their workspaces. You randomly select 10 employees 
and ask them whether they judge their peers by the cleanliness of their 
workspaces. The random variable represents the number of employees who 
judge their peers by the cleanliness of their workspaces. (Source: Adecco) 


29. Living to Age 100 Seventy-seven percent of adults want to live to age 100. 
You randomly select five adults and ask them whether they want to live to 
age 100. The random variable represents the number of adults who want to 
live to age 100. (Source: Standford Center on Longevity) 


30. Meal Programs Fifty-seven percent of school districts offer locally sourced 
fruits and vegetables in their meal programs. You randomly select eight 
school districts and ask them whether they offer locally sourced fruits and 
vegetables in their meal programs. The random variable represents the 
number of school districts that offer locally sourced fruits and vegetables in 
their meal programs. (Source: School Nutrition Association) 


Finding and Interpreting Mean, Variance, and Standard Deviation 
In Exercises 31-36, find the mean, variance, and standard deviation of the 
binomial distribution for the given random variable. Interpret the results. 


31. Political Correctness Seventy-one percent of U.S. adults think that political 
correctness is a problem in America today. You randomly select seven U.S. 
adults and ask them whether they think that political correctness is a problem 
in America today. The random variable represents the number of U.S. adults 
who think that political correctness is a problem in America today. (Source: 
Rasmussen Reports) 


32. Rap and Hip-Hop Music Fifty percent of adults are offended by how 
men portray women in rap and hip-hop music. You randomly select four 
adults and ask them whether they are offended by how men portray women 
in rap and hip-hop music. The random variable represents the number of 
adults who are offended by how men portray women in rap and hip-hop 
music. (Source: Empower Women) 


33. Life on Other Planets Seventy-nine percent of U.S. adults believe that 
life on other planets is plausible. You randomly select eight U.S. adults and 
ask them whether they believe that life on other planets is plausible. The 
random variable represents the number of adults who believe that life on 
other planets is plausible. (Source: Ipsos) 


34. Federal Involvement in Fighting Local Crime Thirty-six percent of likely 
USS. voters think that the federal government should get more involved in 
fighting local crime. You randomly select five likely U.S. voters and ask them 
whether they think that the federal government should get more involved in 
fighting local crime. The random variable represents the number of likely 
USS. voters who think that the federal government should get more involved 
in fighting local crime. (Source: Rasmussen Reports) 


35. 


36. 


SECTION 4.2 Binomial Distributions 235 


Late for Work Thirty-two percent of U.S. employees who are late for 
work blame oversleeping. You randomly select six U.S. employees who are 
late for work and ask them whether they blame oversleeping. The random 
variable represents the number of U.S. employees who are late for work and 
blame oversleeping. (Source: CareerBuilder) 


Supreme Court Ten percent of college graduates think that Judge Judy 
serves on the Supreme Court. You randomly select five college graduates 
and ask them whether they think that Judge Judy serves on the Supreme 
Court. The random variable represents the number of college graduates who 
think that Judge Judy serves on the Supreme Court. (Source: CNN) 


Extending Concepts 


Multinomial Experiments Jn Exercises 37 and 38, use the information below. 


37. 


38. 


39. 


A multinomial experiment satisfies these conditions. 


The experiment has a fixed number of trials n, where each trial is 
independent of the other trials. 


Each trial has k possible mutually exclusive outcomes: FE, E>, F£3,..., Ex. 


Each outcome has a fixed probability. So, P(E,;) = p;, P(Ex) = po, 
P(E3) = p3, ..., P(E,) = p,. The sum of the probabilities for all 
outcomes is p; + p> + p3 +-°- +p, = 1. 


The number of times EF, occurs is x;, the number of times EF occurs is x», 
the number of times F occurs is x3, and so on. 


The discrete random variable x counts the number of times x1, %», 
X3, ..., X, that each outcome occurs in n independent trials where 
X, +X. + x3 + +++ +x, = n. The probability that x will occur is 


n! 


— x xX: Xx: Xk 
P(x) a ; PI'DY'P3 °° * Dit. 
Xy-XQ°X3. °° XE: 


Genetics According to a theory in genetics, when tall and colorful plants 
are crossed with short and colorless plants, four types of plants will result: tall 
and colorful, tall and colorless, short and colorful, and short and colorless, 
with corresponding probabilities of %, +, 4, and ;. Ten plants are selected. 
Find the probability that 5 will be tall and colorful, 2 will be tall and colorless, 
2 will be short and colorful, and 1 will be short and colorless. 


Genetics Another proposed theory in genetics gives the corresponding 
probabilities for the four types of plants described in Exercise 37 as x, in ie 
and a Ten plants are selected. Find the probability that 5 will be tall and 
colorful, 2 will be tall and colorless, 2 will be short and colorful, and 1 will be 
short and colorless. 


Manufacturing An assembly line produces 10,000 automobile parts. Twenty 
percent of the parts are defective. An inspector randomly selects 10 of the parts. 


(a) Use the Multiplication Rule (discussed in Section 3.2) to find the 
probability that none of the selected parts are defective. (Note that the 
events are dependent.) 


(b) Because the sample is only 0.1% of the population, treat the events as 
independent and use the binomial probability formula to approximate 
the probability that none of the selected parts are defective. 


(c) Compare the results of parts (a) and (b). 


Binomial Distribution 


ACTIVITY 


=> The binomial distribution applet allows you to simulate values from a binomial 
distribution. You can specify the parameters for the binomial distribution 


eee (nandp) and the number of values to be simulated (N). When you click 
You can find the interactive SIMULATE, N values from the specified binomial distribution will be plotted at 
applet for this activity the right. The frequency of each outcome is shown in the plot. 


within MyLab Statistics or at 
www.pearsonglobaleditions 
.com. 


n:| 10 
p:| 0.5 2 
N:{100 


Simulate| 


Outcomes 


EXPLORE 


Step 1 Specify a value of n. Step 2 Specify a value of p. 
Step 3 Specify a value of N. Step 4 Click SIMULATE. 


DRAW CONCLUSIONS 


1. During a presidential election year, 70% of a county’s eligible voters cast a vote. 
Simulate selecting n = 10 eligible voters N = 10 times (for 10 communities in 
the county). Use the results to estimate the probability that the number who 
voted in this election is (a) exactly 5, (b) at least 8, and (c) at most 7. 


N 


APPLET 


n 


. During a non-presidential election year, 20% of the eligible voters in the same 
county as in Exercise 1 cast a vote. Simulate selecting n = 10 eligible voters 
N = 10 times (for 10 communities in the county). Use the results to estimate 
the probability that the number who voted in this election is (a) exactly 4, 
(b) at least 5, and (c) less than 4. 


3. For the election in Exercise 1, simulate selecting n = 10 eligible voters 
N = 100 times. Estimate the probability that the number who voted in this 
election is exactly 5. Compare this result with the result in Exercise 1 part (a). 
Which of these is closer to the probability found using the binomial probability 
formula? 


236 CHAPTER 4. Discrete Probability Distributions 


Distribution of Number of Hits in Baseball Games 


The official website of Major League Baseball, MLB.com, records detailed statistics about 
players and games. 
During the 2016 regular season, Dustin Pedroia of the Boston Red Sox had a batting 
average of 0.318. The graphs below show the number of hits he had in games in which he had 
different numbers of at-bats. 


Frequency 


Frequency 


1. 


Games with Three At-Bats 


15 
15+ 
10 -- 
5 
5 + —— 
0 0 
T T t t = 
0 1 2; 3 
Number of hits 
Games with Five At-Bats 
A 
20 -- 
16 
15 -- 14 
10- 


Nn 
i 


0 1 2 3 4 5 
Number of hits 


Construct a probability distribution for the 
number of hits in games with 


(a) 3 at-bats. (b) 4 at-bats.  (c) 5 at-bats. 


. Construct binomial probability distributions 


for p = 0.318 and (a) n = 3, (b) n = 4, and 
(c)n =5. 


. Compare your distributions from Exercise 1 


and Exercise 2. Is a binomial distribution a good 
model for determining the numbers of hits in 
a baseball game for a given number of at-bats? 
Explain your reasoning and include a discussion 
of the four conditions for a binomial experiment. 


Games with Four At-Bats 


30-4 _30_ 
> 21 
2 20+ 19 
| 
o 
~ 10+ 
3 1 
_ —_— ae 


. During the 2016 regular season, Kris Bryant of 


the Chicago Cubs had 37 games with 3 at-bats. 
Of these games, he had 11 games with no hits, 
18 games with one hit, 7 games with two hits, 
and 1 game with three hits. 


(a) Based on Pedroia’s and Bryant’s hits in 
games with 3 at-bats, which player do you 
think had the higher batting average? 


(b) Look up Bryant’s 2016 regular season batting 
average. Was your expectation from part (a) 
correct? If not, propose a reason why. 


Case Study 237 


238 CHAPTER 4 _ Discrete Probability Distributions 


43 More Discrete Probability Distributions 


What You Should Learn 


» How to find probabilities using 
the geometric distribution 


~ How to find probabilities using 
the Poisson distribution 


Tech Tip 


You can use technology 

such as Minitab, Excel, 

StatCrunch, or the 

TI-84 Plus to find a 

geometric probability. 
-™ For instance, here are 
instructions for finding a geometric 
probability on a TI-84 Plus. From 
the DISTR menu, choose the 
geometpdf( feature. Enter the 
values of p and x. Then calculate 
the probability. 


The Geometric Distribution = The Poisson Distribution = Summary of 
Discrete Probability Distributions 


The Geometric Distribution 


Many actions in life are repeated until a success occurs. For instance, you might 
have to send an email several times before it is successfully sent. A situation such 
as this can be represented by a geometric distribution. 


DEFINITION 


A geometric distribution is a discrete probability distribution of a random 
variable x that satisfies these conditions. 


1. A trial is repeated until a success occurs. 
2. The repeated trials are independent of each other. 


3. The probability of success p is the same for each trial. 


4. The random variable x represents the number of the trial in which the first 
success occurs. 


The probability that the first success will occur on trial number x is 


P(x) = pq* ', where gq = 1 — p. 


In other words, when the first success occurs on the third trial, the outcome 
is FFS, and the probability is P(3) = q-q-p, or P(3) = p-q’. 


Using the Geometric Distribution 
A study found that the smartphones made by a certain manufacturer had a 
failure rate of 43%. Four smartphones made by this manufacturer are selected 
at random. Find the probability that the fourth smartphone is the first one to 
have a failure. (Source: Blancco Technology Group) 
SOLUTION 
Using p = 0.43, q = 0.57, and x = 4, you have 
P(4)= 0.43(057)* * 
= 0.43(0.57)3 
= 0.080. 
So, the probability that the fourth smartphone is the first one to have a failure 


is about 0.080. You can use technology to check this result. For instance, using 
a TI-84 Plus, you can find P(4), as shown below. 


TI-84 PLUS 


geometpdfl.43,4) 
.07963299 


Tech Tip 


You can use technology 
such as Minitab, Excel, 
StatCrunch, or the 
TI-84 Plus to find a 
Poisson probability. 

For instance, here are 
instructions for finding a Poisson 
probability on a TI-84 Plus. From 
the DISTR menu, choose the 
poissonpdf( feature. Enter the 
values of w and x. (Note that the 
TI-84 Plus uses the Greek letter 
lambda, A, in place of uw.) Then 
calculate the probability. 


TI-84 PLUS 


poissonpdf(3, 4) 
.1680313557 


SECTION 4.3 More Discrete Probability Distributions 239 


TRY IT YOURSELF 1 


The study in Example 1 found that the smartphones made by a second 
manufacturer had a failure rate of 14%. Six smartphones made by this 
manufacturer are selected at random. Find the probability that the sixth 
smartphone is the first one to have a failure. (Source: Blancco Technology Group) 

Answer: Page A35 


Even though theoretically a success may never occur, the geometric 
distribution is a discrete probability distribution because the values of x can be 
listed: 1, 2,3, .... Notice that as x becomes larger, P(x) gets closer to zero. For 
instance, in Example 1, the probability that the thirtieth smartphone is the first 
one to have a failure is 


P(30) = 0.43(0.57)*?~! = 0.43(0.57)*? ~ 0.00000004 


The Poisson Distribution 


In a binomial experiment, you are interested in finding the probability of a 
specific number of successes in a given number of trials. Suppose instead that 
you want to know the probability that a specific number of occurrences takes 
place within a given unit of time, area, or volume. For instance, to determine the 
probability that an employee will take 15 sick days within a year, you can use the 
Poisson distribution. 


DEFINITION 


The Poisson distribution is a discrete probability distribution of a random 
variable x that satisfies these conditions. 


1. The experiment consists of counting the number of times x an event occurs 
in a given interval. The interval can be an interval of time, area, or volume. 


2. The probability of the event occurring is the same for each interval. 


3. The number of occurrences in one interval is independent of the number 
of occurrences in other intervals. 


The probability of exactly x occurrences in an interval is 


eee 
P(x) = a 
where e is an irrational number approximately equal to 2.71828 and w is the 
mean number of occurrences per interval unit. 


Using the Poisson Distribution 
The mean number of accidents per month at a certain intersection is three. 


What is the probability that in any given month four accidents will occur at 
this intersection? 


SOLUTION 


Using x = 4 and w = 3, the probability that 4 accidents will occur in any given 
month at the intersection is 
34*(2.71828) 3 
4) ~ ———_—— * 0.168. 
4! 
You can use technology to check this result. For instance, using a TI-84 Plus, 
you can find P(4), as shown at the left. 


240 CHAPTER 4 _ Discrete Probability Distributions 


lox 
COON 
aera 


v=) Picturing 
the World 


The first successful suspension 
bridge built in the United States, 
the Tacoma Narrows Bridge, spans 
the Tacoma Narrows in Washington 
State. The average occupancy 

of vehicles that travel across 

the bridge is 1.6. The probability 
distribution shown below 
represents the vehicle occupancy 
on the bridge during a five-day 
period. (Adapted from Washington State 
Department of Transportation) 


P(x) 


Probability 


123 4 5 6+ 


Number of people 
in vehicle 


During the five-day period, what 

is the probability that a randomly 
selected vehicle has two occupants 
or fewer? 


TRY IT YOURSELF 2 


What is the probability that more than four accidents will occur in any given 
month at the intersection? 
Answer: Page A35 


In Example 2, you used a formula to determine a Poisson probability. You 
can also use a table to find Poisson probabilities. Table 3 in Appendix B lists the 
Poisson probabilities for selected values of x and yw. You can also use technology 
tools, such as Minitab, Excel, and the TI-84 Plus, to find Poisson probabilities. 


Finding a Poisson Probability Using a Table 

A population count shows that the average number of rabbits per acre living 
in a field is 3.6. Use a table to find the probability that seven rabbits are found 
on any given acre of the field. 


SOLUTION 

A portion of Table 3 in Appendix B is shown here. Using the distribution 
for w = 3.6 and x = 7, you can find the Poisson probability as shown by the 
highlighted areas in the table. 


BE 

x 3.1 3.2 3.3 3.4 3.5 3.7 
0 0450 .0408 .0369 .0334 .0302 .0247 
1 NS ASO A NNIS ——_ 7/ .0915 
2 2165 .2087 .2008 .1929  .1850 .1692 
3 7223/2226 22 092 Oe Zi158 .2087 
4 1734 =.1781—.1823, «1858 ~—.1888 1931 
5 1075 = .1140 =.1203.s «1264 —.1322 1429 
6 0555 .0608 .0662 .0716  ~.0771 .0881 
.0466 
8 .0095 ~—-«.0111 0129 .0148 =.0169 ~—«.0191 .0215 
g) 0033. .0040 .0047 .0056 .0066 .0076 # .0089 
10 0010 .0013 .0016 .0019 .0023 .0028 .0033 


According to the table, the probability is 0.0425. You can check this result 
using technology. As shown below using Excel, the probability is 0.042484. 
After rounding to four decimal places, the probability is 0.0425, which is the 
same value found using the table. 


A B 
1_| POISSON(7,3.6,FALSE) 
2 0.042484 


Interpretation So, the probability that seven rabbits are found on any given 
acre is 0.0425. Because 0.0425 is less than 0.05, this can be considered an 
unusual event. 


TRY IT YOURSELF 3 


Two thousand brown trout are introduced into a small lake. The lake has a 
volume of 20,000 cubic meters. Use a table to find the probability that three 
brown trout are found in any given cubic meter of the lake. 

Answer: Page A35 


Distribution 


Binomial 
Distribution 


Geometric 
Distribution 


Poisson 
Distribution 


SECTION 4.3 More Discrete Probability Distributions 241 


Summary of Discrete Probability Distributions 


The table summarizes the discrete probability distributions discussed in this 


chapter. 


Summary 


A binomial experiment satisfies these 
conditions. 


1. The experiment has a fixed number n of 
independent trials. 


2. There are only two possible outcomes for 
each trial. Each outcome can be classified 
as a success or as a failure. 


3. The probability of success p is the same for 
each trial. 

4. The random variable x counts the number 
of successful trials. 


The parameters of a binomial distribution are 
nand p. 


A geometric distribution is a discrete probability 
distribution of a random variable x that satisfies 
these conditions. 


1. A trial is repeated until a success occurs. 


2. The repeated trials are independent of 
each other. 


3. The probability of success p is the same for 
each trial. 

4. The random variable x represents the number 
of the trial in which the first success occurs. 

The parameter of a geometric distribution 

Is p. 


The Poisson distribution is a discrete probability 
distribution of a random variable x that satisfies 
these conditions. 

1. The experiment consists of counting the 
number of times x an event occurs over a 
specified interval of time, area, or volume. 

2. The probability of the event occurring is the 
same for each interval. 


3. The number of occurrences in one interval is 
independent of the number of occurrences in 
other intervals. 


The parameter of the Poisson distribution is p. 


Formulas 


n = the number of trials 
x = the number of successes in 7 trials 


Pp = probability of success in a single 
trial 


q = probability of failure in a single trial 
q-1-p 
The probability of exactly x successes in 
n trials is 

P(x) = ,Cxp*q" * 


n! 


= x nx 
(n — x)! xl? ™ ? 

yw. = mp 

o” = npq 

o = Vnpq 


x = the number of the trial in which the 
first success occurs 


Pp = probability of success in a single 
trial 


q = probability of failure in a single trial 
gq = Lap 

The probability that the first success 
occurs on trial number x is 


P(x) = pq’ '. 


x = the number of occurrences in the 
given interval 


p = the mean number of occurrences in 
a given interval unit 


The probability of exactly x occurrences 
in an interval is 
oe ai 


x! 


P(x) 


242 CHAPTER 4 _ Discrete Probability Distributions 


4.3 EXERCISES 


For Extra Help: MyLab Statistics 


Building Basic Skills and Vocabulary 


In Exercises 1—4, find the indicated probability using the geometric distribution. 


1. 
3. 


Find P(3) when p = 0.65. 2. Find P(1) when p = 0.45. 
Find P(5) when p = 0.09. 4, Find P(8) when p = 0.28. 


In Exercises 5-8, find the indicated probability using the Poisson distribution. 


5. 
7. 


9. 


10. 


Find P(4) when p = 5. 6. Find P(3) when pu = 6. 

Find P(2) when pw = 1.5. 8. Find P(5) when pw = 9.8. 

In your own words, describe the difference between the value of x in a 
binomial distribution and in a geometric distribution. 


In your own words, describe the difference between the value of x in a 
binomial distribution and in the Poisson distribution. 


Using and Interpreting Concepts 


Using a Distribution to Find Probabilities nm Exercises 11-26, find the 
indicated probabilities using the geometric distribution, the Poisson distribution, 
or the binomial distribution. Then determine whether the events are unusual. If 
convenient, use a table or technology to find the probabilities. 


11. 


13. 


14. 


15. 


16. 


Clearing an exam The probability that you will clear an entrance exam 
in any attempt is 0.26. Find the probability that you (a) clear the exam on 
the fourth attempt, (b) clear the exam on the first, second or third attempt, 
(c) do not clear the exam on the first three attempts. 


. Defective Parts A spare parts seller finds that 3 in every 100 parts sold is 


defective. Find the probability that (a) the first defective part is the eighth 
part sold, (b) the first defective part is the first, second or third part sold, 
(c) none of the first 8 parts sold are defective. 


Migrants The mean number of international migrants gained per minute 
in the United States in a recent year was about two. Find the probability 
that the number of international migrants gained in any given minute is 
(a) exactly five, (b) at least five, and (c) more than five. (Source: U.S. Census 
Bureau) 


Grammatical Errors A publisher finds that the mean number of grammatical 
errors per page of a book is six. Find the probability that the number of 
grammatical errors found on any given page is (a) exactly four, (b) at most 
four, (c) more than four. 


Pass Completions Football player Ben Roethlesberger completes a pass 
64.1% of the time. Find the probability that (a) the first pass he completes 
is the second pass, (b) the first pass he completes is the first or second pass, 
and (c) he does not complete his first two passes. (Source: National Football 
League) 


Pilot Test The probability that a student passes the written test for a 
private pilot license is 0.67. Find the probability that the student (a) passes 
on the first attempt, (b) passes on the second attempt, (c) does not pass on 
the first or second attempt. 


17. 


18. 


19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


SECTION 4.3 More Discrete Probability Distributions 243 


Cloth Manufacturer A cloth manufacturer finds that 1 in every 400 shirts 
produced is faded. Find the probability that (a) the first faded shirt is the 
eighth item produced, (b) the first faded shirt is the first, second, or third 
item produced, and (c) none of the first eight shirts produced are faded. 


Winning a Prize A cereal maker places a toy in each of its cereal boxes. 
The probability of winning this toy is 1 in 5. Find the probability that you 
(a) win your first toy with your fifth purchase, (b) win your first toy with your 
first, second, third, or fourth purchase, and (c) do not win a toy with your 
first five purchases. 


Droughts The mean number of droughts in Asia per year from 1980 
through 2008 was about 3.24. Find the probability that the number of 
droughts in Asia in any given year from 1980 through 2008 are (a) exactly 
two, (b) at most two, and (c) more than two. (Source: UNISDR) 


Living Donor Transplants The mean number of organ transplants from 
living donors performed per day in the United States in 2016 was about 16. 
Find the probability that the number of organ transplants from living donors 
performed on any given day is (a) exactly 12, (b) at least eight, and (c) no 
more than 10. (Source: United Network for Organ Sharing) 


Reservations Fifty-two percent of adults in Delhi are unaware about the 
reservation system in India. You randomly select six adults in Delhi. Find the 
probability that the number of adults in Delhi who are unaware about the 
reservation system in India is (a) exactly five, (b) less than four, and (c) at 
least four. (Source: The Wire) 


Teen Instagram Use Sixty-three percent of U.S. teenagers say that they 
use Instagram daily. You randomly select seven U.S. teenagers. Find 
the probability that the number of U.S. teenagers who say that they use 
Instagram daily is (a) exactly two, (b) more than three, and (c) between one 
and four, inclusive. (Source: eMarketer) 


Paying for College Education Sixty-eight percent of parents of children 
ages 8-14 say they are willing to get a second or part-time job to pay for 
their children’s college eduction. You randomly select five parents. Find 
the probability that the number of parents who say they are willing to get 
a second or part-time job to pay for their children’s college eduction is 
(a) exactly three, (b) less than four, and (c) at least three. (Source: T. Rowe 
Price Group, Inc.) 


Cheating Sixty-eight percent of undergraduate students admit to cheating 
on tests or in written work. You randomly select six undergraduate students. 
Find the probability that the number of undergraduate students who admit 
to cheating on tests or in written work is (a) exactly four, (b) more than two, 
and (c) at most five. (Source: The Atlantic) 


Precipitation In Akron, Ohio, the mean number of days in April with 0.01 
inch or more of precipitation is 14. Find the probability that the number of 
days in April with 0.01 inch or more of precipitation in Akron is (a) exactly 
17 days, (b) at most 17 days, and (c) more than 17 days. (Source: National 
Climatic Data Center) 


Oil Tankers The mean number of oil tankers at a port city is eight per 
day. Find the probability that the number of oil tankers on any given day is 
(a) exactly eight, (b) at most three, and (c) more than eight. 


244 


CHAPTER 4 _ Discrete Probability Distributions 


Extending Concepts 


27. 


28. 


Comparing Binomial and Poisson Distributions An automobile 
manufacturer finds that 1 in every 2500 automobiles produced has a 
specific manufacturing defect. (a) Use a binomial distribution to find 
the probability of finding 4 cars with the defect in a random sample of 
6000 cars. (b) The Poisson distribution can be used to approximate the 
binomial distribution for large values of n and small values of p. Repeat 
part (a) using the Poisson distribution and compare the results. 


Hypergeometric Distribution Binomial experiments require that any 
sampling be done with replacement because each trial must be independent 
of the others. The hypergeometric distribution also has two outcomes: 
success and failure. The sampling, however, is done without replacement. 
For a population of N items having k successes and N — k failures, the 
probability of selecting a sample of size n that has x successes and n — x 
failures is given by 
(xCx) (nw—xCn-x) 


P(x) = : 
( ) NC 


In a shipment of 15 microchips, 2 are defective and 13 are not defective. A 
sample of three microchips is chosen at random. Use the above formula to 
find the probability that (a) all three microchips are not defective, (b) one 
microchip is defective and two are not defective, and (c) two microchips are 
defective and one is not defective. 


Geometric Distribution: Mean and Variance Jn Exercises 29 and 30, 
use the fact that the mean of a geometric distribution is w = 1 7 p and the variance 
iso? =@q / p?. 


29. 


30. 


Daily Lottery A daily number lottery chooses three balls numbered 0 to 9. 
The probability of winning the lottery is 1/1000. Let x be the number of 
times you play the lottery before winning the first time. (a) Find the mean, 
variance, and standard deviation. (b) How many times would you expect 
to have to play the lottery before winning? (c) The price to play is $1 and 
winners are paid $500. Would you expect to make or lose money playing this 
lottery? Explain. 


Paycheck Errors A company assumes that 0.5% of the paychecks for 
a year were calculated incorrectly. The company has 200 employees 
and examines the payroll records from one month. (a) Find the mean, 
variance, and standard deviation. (b) How many employee payroll records 
would you expect to examine before finding one with an error? 


Poisson Distribution: Variance /n Exercises 31 and 32, use the fact that the 
variance of the Poisson distribution is 0? = . 


31. 


32. 


Golf In a recent year, the mean number of strokes per hole for golfer 
Steven Bowditch was about 4.1. (a) Find the variance and standard deviation. 
Interpret the results. (b) Find the probability that he would play an 18-hole 
round and have more than 72 strokes. (Source: PGATour.com) 


Bankruptcies The mean number of bankruptcies filed per hour by 
businesses in the United States in 2016 was about 2.8. (a) Find the variance 
and the standard deviation. Interpret the results. (b) Find the probability 
that at most five businesses will file bankruptcy in any given hour. (Source: 
Administrative Office of the U.S. Courts) 


AND | Statistics in the Real World 


Uses 


There are countless occurrences of Poisson probability distributions in business, 
sociology, computer science, and many other fields. 

For instance, suppose you work for the fire department in the city of Erie, 
Pennsylvania. You have to make sure the department has enough personnel and 
vehicles on hand to respond to fires, medical emergencies, and other situations 
where they provide aid. The fire department’s records show that they respond 
to an average of 15 incidents per day, but one day the department responds 
to 19 incidents. Is this an unusual event? If so, they may need to update their 
guidelines so that they are prepared to respond to more incidents. 

Knowing the characteristics of the Poisson distribution will help you 
answer this type of question. By the time you have completed this course, 
you will be able make educated decisions about the reasonableness of the fire 
department’s guidelines. 


Abuses 


A common misuse of the Poisson distribution is to think that the “most likely” 
outcome is the outcome that will occur most of the time. For instance, suppose 
you are planning a typical day of responding to emergencies for the fire 
department. The most likely number of incidents the department will need to 
respond to is 15. Although this is the most likely outcome, the probability that it 
will occur is only about 0.102. There is about a 0.183 chance the department will 
need to respond to 16 or 17 incidents, and about a 0.251 chance of 18 or more 
incidents. So, it would be a mistake to simply plan for 15 incidents every day, 
thinking that days with less incidents and days with more incidents will balance 
out over time. 

Citizens’ safety and even lives can depend on the fire department, so it is 
important to be ready for any likely scenario. The lowest number of incidents 
that is unlikely (P < 0.05) is 20, with a probability of about 0.0418, so the fire 
department should be prepared to respond at least 19 incidents per day. 


EXERCISES 


In Exercises 1-3, assume the fire department guidelines are correct and that 
they respond to an average of 15 emergency incidents per day. Use the graph of 
the Poisson distribution and technology to answer the questions. Explain your 
reasoning. 


P(x) 1. On a random day, what is more likely, 15 emergency 
0.11 + incidents or at least 20 incidents? 


2. Ona random day, what is more likely, 14 to 16 emergency 
incidents or less than 14 incidents? 


3. On the 4th of July, the fire department responds to 21 
incidents. Is there reason to believe the guidelines should 
be adjusted for this holiday? 


Probability 
i=) 
& 
i 


0.04 5 
0.03 = 


0.01 5 


4 6 8 10 12 14 16 18 20 22 24 26 28 
Number of incidents 


Uses and Abuses 245 


246 CHAPTER 4 _ Discrete Probability Distributions 


4 Chapter Summary 


Review 
What Did You Learn? Example(s) Exercises 
Section 4.1 
» How to distinguish between discrete random variables and continuous 1 1,2 
random variables 
» How to construct and graph a discrete probability distribution 2 3,4 
» How to determine whether a distribution is a probability distribution 3,4 5,6 
» How to find the mean, variance, and standard deviation of a discrete 5,6 7,8 
probability distribution 
b= &xXP(x) Mean of a discrete random variable 
ao? = 3 (x — w)?P(x) Variance of a discrete random variable 
g=Ver= V3 (x m)?P(x) Standard deviation of a discrete 
random variable 
» How to find the expected value of a discrete probability distribution 7 9, 10 
E(x) = pw = >xP(x) Expected value 
Section 4.2 
» How to determine whether a probability experiment is a binomial experiment 1 11, 12 
» How to find binomial probabilities using the binomial probability formula, a 2,4-6 13-16, 23, 26 
binomial probability table, and technology 
P(X) = 7C,p*%*qr * = aoe Binomial probability formula 
» How to construct and graph a binomial distribution 3,7 17, 18 
» How to find the mean, variance, and standard deviation of a binomial 8 19, 20 
probability distribution 
w= np Mean of a binomial distribution 
o* = npq Variance of a binomial distribution 
o= Vnpq Standard deviation of a binomial distribution 
Section 4.3 
» How to find probabilities using the geometric distribution 1 21, 24 
P(x) = pq*"' Probability that the first success will occur on trial 
number x 
» How to find probabilities using the Poisson distribution 2,3 22,25 
P(x) = eet Probability of exactly x occurrences in an interval 


x! 


Review Exercises 247 


4 Review Exercises 


Section 4.1 


In Exercises I and 2, determine whether the random variable x is discrete or 
continuous. Explain. 


1. Let x represent the number of registers in use at a departmental store. 


2. Let x represent the weight of a truck at a weigh station. 


In Exercises 3 and 4, (a) construct a probability distribution, and (b) graph the 
probability distribution using a histogram and describe its shape. 


3. The number of hits per game played by a Major League Baseball player 


Hits 0 1 2 3 4°15 
Games 29 62 | 33 12 3°51 


4. The number of hours students in a college class slept the previous night 


Hours 4 5 6/7) 8 9 | 10 

Students 1 6 13 23) 14 > 4 | 2 
In Exercises 5 and 6, determine whether the distribution is a probability 
distribution. If it is not a probability distribution, explain why. 
5. The random variable x represents the number of tickets a police officer writes 

out each shift. 

x 0 1 2 3 4 5 6 

P(x) 0.09 0.21 0.12 0.08 | 0.24 0.17 | 0.09 


6. The random variable x represents the number of misprints in a page of 


a book. 

x 0 1 2 3 4 5 6 7 
5 2 3 

P(x) 100 720 5 B is i 60 m0 


In Exercises 7 and 8, (a) find the mean, variance, and standard deviation of the 
probability distribution, and (b) interpret the results. 


7. The number of road accidents on a weekday at a busy crossing 


Number of accidents 0 1 2 3 4 5 6 
Probability 0.150 | 0.165 0.265 | 0.195 | 0.110 | 0.090 | 0.025 


8. A shopkeeper sells mangoes in packets of 5-, 10-, 20-, 25-, and 40-lbs. The 
distribution of sales for one year is given. 


Packet Weight (in Ibs) 5 10 20 25 40 
Probability 0.250 | 0.325 0.200 0.100 | 0.125 


248 CHAPTER 4 Discrete Probability Distributions 


Prize Probability 


$200 a 
400 
1 
$100 100 
1 
50 ae 
: 25 


TABLE FOR EXERCISE 10 


In Exercises 9 and 10, find the expected net gain to the player for one play of 
the game. 


9. It costs $12 to bet on a horse race. The horse has a z chance of winning and 
a } chance of placing 2nd or 3rd. You win $42 if the horse wins and receive 
your money back if the horse places 2nd or 3rd. 


10. Playing a gambling game costs $10. The table shows the probability of 
winning various prizes on the game. 


Section 4.2 


In Exercises 11 and 12, determine whether the experiment is a binomial experiment. 
[f it is, identify a success, specify the values of n, p, and q, and list the possible 
values of the random variable x. If it is not a binomial experiment, explain why. 


11. Bags of milk chocolate M&M’s contain 16% green candies. One candy is 
selected from each of 12 bags. The random variable represents the number 
of green candies selected. (Source: Mars, Inc.) 


12. A fair coin is tossed repeatedly until 15 heads are obtained. The random 
variable x counts the number of tosses. 


In Exercises 13-16, find the indicated binomial probabilities. If convenient, use 
technology or Table 2 in Appendix B. 


13. Fifty-three percent of U.S. adults want to lose weight. You randomly select 
eight U.S. adults. Find the probability that the number of U.S. adults who 
want to lose weight is (a) exactly three, (b) at least three, and (c) more than 
three. (Source: Gallup) 


14. Thirty-nine percent of U.S. adults have a gun in their home. You randomly 
select 12 U.S. adults. Find the probability that the number of U.S. adults who 
have a gun in their home is (a) exactly two, (b) at least two, and (c) more 
than two. (Source: Gallup) 


15. Eighty-eight percent of U.S. civilian full-time employees have access to 
medical care benefits. You randomly select nine civilian full-time employees. 
Find the probability that the number of civilian full-time employees who 
have access to medical care benefits is (a) exactly six, (b) at least six, and 
(c) more than six. (Source: U.S. Bureau of Labor Statistics) 


16. Sixty-two percent of U.S. adults get news on social media sites. You 
randomly select five U.S. adults. Find the probability that the number of U.S. 
adults who get news on social media sites is (a) exactly two, (b) at least two, 
and (c) more than two. (Source: Pew Research Center) 


In Exercises 17 and 18, (a) construct a binomial distribution, (b) graph the 
binomial distribution using a histogram and describe its shape, and (c) identify 
any values of the random variable x that you would consider unusual. Explain 
your reasoning. 


17. Seventy-six percent of stay-at-home mothers have a college degree or higher. 
You randomly select five stay-at-home mothers and ask them whether they 
have a college degree or higher. The random variable represents the number 
of stay-at-home mothers who have a college degree or higher. (Source: 
Hulafrog) 


18. Eighty-eight percent of U.S. adults use the Internet. You randomly select 
six U.S. adults and ask them whether they use the Internet. The random 
variable represents the number of U.S. adults who use the Internet. (Source: 
Pew Research Center) 


Review Exercises 249 


In Exercises 19 and 20, find the mean, variance, and standard deviation of the 
binomial distribution for the given random variable. Interpret the results. 


19. 


20. 


About 13% of U.S. drivers are uninsured. You randomly select eight U.S. 
drivers and ask them whether they are uninsured. The random variable 
represents the number of U.S. drivers who are uninsured. (Source: Insurance 
Research Council) 


Fifty-six percent of college student-athletes receive athletics scholarships. 
You randomly select five college student-athletes and ask whether they 
receive athletics scholarships. The random variable represents the number 
of college student-athletes who receive athletics scholarships. (National 
Collegiate Athletic Association) 


Section 4.3 


In 


Exercises 21-26, find the indicated probabilities using the geometric 


distribution, the Poisson distribution, or the binomial distribution. Then determine 
whether the events are unusual. If convenient, use a table or technology to find the 
probabilities. 


21. 


22. 


23. 


24. 


25. 


26. 


Eighty-two percent of people using electronic cigarettes (vapers) are 
ex-smokers of conventional cigarettes. You randomly select 10 vapers. Find 
the probability that the first vaper who is an ex-smoker of conventional 
cigarettes is (a) the second person selected, (b) the fourth or fifth person 
selected, and (c) not one of the second through seventh persons selected. 
(Source: ChurnMag) 


During a 77-year period, tornadoes killed about 0.27 people per day in the 
United States. Assume this rate holds true today and is constant throughout 
the year. Find the probability that the number of people in the United 
States killed by a tornado tomorrow is (a) exactly zero, (b) at most two, and 
(c) more than one. (Source: National Weather Service) 


Thirty-six percent of Americans think there is still a need for the practice 
of changing their clocks for Daylight Savings Time. You randomly select 
seven Americans. Find the probability that the number of Americans who 
say there is still a need for changing their clocks for Daylight Savings Time 
is (a) exactly four, (b) less than two, and (c) at least six. (Source: Rasmussen 
Reports) 


In a recent season, hockey player Evgeni Malkin scored 27 goals in 57 games 
he played. Assume that his goal production stayed at that level for the next 
season. Find the probability that he would get his first goal (a) in the first 
game of the season, (b) in the second game of the season, (c) within the first 
three games of the season, and (d) not within the first three games of the 
season. (Source: National Hockey League) 


During a 10-year period, sharks killed an average of 6.1 people each year 
worldwide. Find the probability that the number of people killed by sharks 
next year is (a) exactly three, (b) more than six, and (c) at most five. (Source: 
International Shark Attack File) 


Eighty-two percent of U.S. adults think that healthy children should be 
required to be vaccinated to attend school. You randomly select 10 U.S. 
adults. Find the probability that the number of U. S. adults who think that 
healthy children should be required to be vaccinated to attend school is 
(a) exactly eight, (b) more than six, and (c) at most six. (Source: Pew Research 
Center) 


250 CHAPTER 4 _ Discrete Probability Distributions 


4 Chapter Quiz 


Take this quiz as you would take a quiz in class. After you are done, check your 
work against the answers given in the back of the book. 


1. 


Determine whether the random variable x is discrete or continuous. Explain 

your reasoning. 

(a) Let x represent the number of lightning strikes that occur in Wyoming 
during the month of June. 

(b) Let x represent the amount of fuel (in gallons) used by a jet during 
takeoff. 


(c) Let x represent the total number of die rolls required for an individual to 
roll a five. 


. The table lists the number of wireless devices per household in a small town 


in the United States. 


Wireless devices 0 1 2 3 4 5 
Number of households 277 471 | 243 | 105 46 22 


(a) Construct a probability distribution. 


(b) Graph the probability distribution using a histogram and describe its 
shape. 

(c) Find the mean, variance, and standard deviation of the probability 
distribution and interpret the results. 


(d) Find the probability of randomly selecting a household that has at least 
four wireless devices. 


. Thirty-six percent of U.S. adults have postponed medical checkups or 


procedures to save money. You randomly select nine U.S. adults. Find the 
probability that the number of U.S. adults who have postponed medical 
checkups or procedures to save money is (a) exactly three, (b) at most four, 
and (c) more than five. (Source: Rasmussen Reports) 


. The five-year success rate of kidney transplant surgery from living donors is 


86%. The surgery is performed on six patients. (Source: Mayo Clinic) 
(a) Construct a binomial distribution. 
(b) Graph the binomial distribution using a histogram and describe its shape. 


(c) Find the mean, variance, and standard deviation of the binomial 
distribution and interpret the results. 


. An online magazine finds that the mean number of typographical errors per 


page is five. Find the probability that the number of typographical errors 
found on any given page is (a) exactly five, (b) less than five, and (c) exactly 
zero. 


. Basketball player Dwight Howard makes a free throw shot about 56% of the 


time. Find the probability that (a) the first free throw shot he makes is the 
fourth shot, (b) the first free throw shot he makes is the second or third shot, 
and (c) he does not make his first three shots. (Source: ESPN) 


. Which event(s) in Exercise 6 can be considered unusual? Explain your 


reasoning. 


4 Chapter Test 


Chapter Test 251 


Take this test as you would take a test in class. 


In Exercises 1-3, find the indicated probabilities using the geometric distribution, 
the Poisson distribution, or the binomial distribution. Then determine whether the 
events are unusual. If convenient, use a table or technology to find the probabilities. 


1 


One out of every 119 tax returns that a tax auditor examines requires an audit. 
Find the probability that (a) the first return requiring an audit is the 25th 
return the tax auditor examines, (b) the first return requiring an audit is the 
first or second return the tax auditor examines, and (c) none of the first five 
returns the tax auditor examines require an audit. (Source: Kiplinger) 


. About 60% of U.S. full-time college students drank alcohol within a 


one-month period. You randomly select six U.S. full-time college students. 
Find the probability that the number of U.S. full-time college students who 
drank alcohol within a one-month period is (a) exactly two, (b) at least three, 
and (c) less than four. (Source: National Center for Biotechnology Information) 


. The mean increase in the U.S. population is about four people per minute. 


Find the probability that the increase in the U.S. population in any given 
minute is (a) exactly six people, (b) more than eight people, and (c) at most 
four people. (Source: U.S. Census Bureau) 


. Determine whether the distribution is a probability distribution. If it is not a 


probability distribution, explain why. 


(a) | 0 5 10 | 15 | 20 
P(x) 0.03 0.09 0.19 | 0.32 0.37 

(b) | , 1/2/)3 |4)5 | 6 
me 3 | io | 3 | i | Ss | Cs 


. The table shows the ages of students in a freshman orientation course. 
Age 17 | 18 | 19 | 20 | 21 | 22 
Students 2 13 4/3 2 51 


(a) Construct a probability distribution. 
(b) Graph the probability distribution using a histogram and describe its shape. 


(c) Find the mean, variance, and standard deviation of the probability 
distribution and interpret the results. 


(d) Find the probability that a randomly selected student is less than 20 years old. 


. Seventy-seven percent of U.S. college students pay their bills on time. You 


randomly select five U.S. college students and ask them whether they pay 
their bills on time. The random variable represents the number of U.S. college 
students who pay their bills on time. (Source: Sallie Mae) 


(a) Construct a probability distribution. 
(b) Graph the probability distribution using a histogram and describe its shape. 


(c) Find the mean, variance, and standard deviation of the probability 
distribution and interpret the results. 


Putting it all together 


REAL DECISIONS 


The Centers for Disease Control and Prevention (CDC) is required Results of ART Cycles 

by law to publish a report on assisted reproductive technology Using Fresh Nondonor 
(ART). ART includes all fertility treatments in which both the egg Eggs or Embryos 

and the sperm are used. These procedures generally involve removing 

eggs from a woman’s ovaries, combining them with sperm in the _. Ectopic pregnancy 
laboratory, and returning them to the woman’s body or giving them oe nee 

. y> 8 y & & pregnancy 

to another woman. 33.0% 


You are helping to prepare the CDC report and select at random 
10 ART cycles for a special review. None of the cycles resulted in a 
clinical pregnancy. Your manager feels it is impossible to select at 
random 10 ART cycles that do not result in a clinical pregnancy. Use 
the pie chart at the right and your knowledge of statistics to determine No pregnancy 
whether your manager is correct. 66.4% 


(Source: Centers for Disease Control and Prevention) 
EXERCISES 


1. How Would You Do It? Pregnancy and Live Birth Rates for 
(a) How would you determine whether your manager is ART Cycles Among Women 
correct, that it is impossible to select at random 10 ART of Age 40 and Older 

cycles that do not result in a clinical pregnancy? A 
(b) What probability distribution do you think best describes eal 1 Pregnancy rate 

the situation? Do you think the distribution of the 254 G Live birth rate 

number of clinical pregnancies is discrete or continuous? & 291 |e 

Explain your reasoning. r es 

2. Answering the Question 2 val 
Write an explanation that answers the question, “Is it possible acl. ; 
to select at random 10 ART cycles that do not result in a clinical | 
pregnancy?” Include in your explanation the appropriate Wea (a 
probability distribution and your calculation of the probability and older 
of no clinical pregnancies in 10 ART cycles. AGS 
(Source: Centers for Disease Control and Prevention) 


3. Suspicious Samples? 

A lab worker tells you that the samples below were selected at 

random. Using the graph at the right, which of the samples would 

you consider suspicious? Would you believe that the samples were 

selected at random? Explain your reasoning. 

(a) A sample of 10 ART cycles among women of age 40, eight of 
which resulted in clinical pregnancies 

(b) A sample of 10 ART cycles among women of age 41, none of 
which resulted in clinical pregnancies 


252 CHAPTER 4 _ Discrete Probability Distributions 


TECHNOLOGY 


Using Poisson Distributions as Queuing Models 
Queuing means waiting in line to be served. There are many examples 
of queuing in everyday life: waiting at a traffic light, waiting in line cit. 


at a grocery checkout counter, waiting for an elevator, holding for a per 


telephone call, and so on. 


Poisson distributions are used to model and predict the number of 
people (calls, computer programs, vehicles) arriving at the line. In the 
exercises below, you are asked to use Poisson distributions to analyze 


the queues at a grocery store checkout counter. 


In Exercises 1-7, consider a grocery store that can process 
a total of four customers at its checkout counters each 
minute. 


1. The mean number of customers who arrive at the 
checkout counters each minute is 4. Create a Poisson 
distribution with w = 4 for x = 0 to 20. Compare your 
results with the histogram shown at the upper right. 


2. Minitab was used to generate 20 random numbers 
with a Poisson distribution for 1 = 4. Let the random 
number represent the number of arrivals at the 
checkout counter each minute for 20 minutes. 


33.3.3 3°59 6 7 3. 6 
3.563 4622 4 1 


During each of the first four minutes, only three 
customers arrived. These customers could all be 
processed, so there were no customers waiting after 
four minutes. 


(a) How many customers were waiting after 5 minutes? 
6 minutes? 7 minutes? 8 minutes? 


(b) Create a table that shows the number of customers 
waiting at the end of 1 through 20 minutes. 


3. Generate a list of 20 random numbers with a Poisson 
distribution for 4 = 4. Create a table that shows the 
number of customers waiting at the end of 1 through 
20 minutes. 


Extended solutions are given in the technology manuals that accompany this text. 


Technical instruction is provided for Minitab, Excel, and the TI-84 Plus. 


° 
ia 
L 

J 


Probability 


00 Seer 
0 2 4 6 8 1012 14 16 18 20 
Number of arrivals per minute 


4. The mean increases to 5 arrivals per minute, but the 


store can still process only four per minute. Generate 
a list of 20 random numbers with a Poisson distribution 
for w = 5. Then create a table that shows the number 
of customers waiting at the end of 20 minutes. 


5. The mean number of arrivals per minute is 5. What is 


the probability that 10 customers will arrive during the 
first minute? 


6. The mean number of arrivals per minute is 4. Find the 


probability that 


(a) three, four, or five customers will arrive during the 
third minute. 


(b) more than four customers will arrive during the 
first minute. 


(c) more than four customers will arrive during each 
of the first four minutes. 


7. The mean number of arrivals per minute is 4. Find the 


probability that 
(a) no customers are waiting in line after one minute. 
(b) one customer is waiting in line after one minute. 


(c) one customer is waiting in line after one minute 
and no customers are waiting in line after the 
second minute. 


(d) no customers are waiting in line after two minutes. 


Technology 253 


254 


———eee 


Normal Probability 
Uistributions 


The average dairy cow in the United States produced 22,770 pounds of milk in 2016, 
more than twice the average from 50 years ago. 


5.1 


b.2 


5.3 


Case Study 


A 


Activity 


5 


Uses and Abuses 
Real Statistics— Real Decisions 
Technology 


J Where You ve Been 


In Chapters 1 through 4, you learned how to collect 
and describe data, find the probability of an event, and 
analyze discrete probability distributions. You also learned 
that when a sample is used to make inferences about a 
population, it is critical that the sample not be biased. For 
instance, how would you organize a study to determine 
which breed of dairy cow is the most profitable? 


Ls, Where Youre Going 


When the U.S. Department of Agriculture performs this 
study, it uses random sampling and then records the 
measures of various milk production and physical traits such 
as pounds produced, fat percentage, protein percentage, 
productive life, somatic cell count, and calving ability. The 
studies have repeatedly shown Holstein cows to be the 
most profitable breed of dairy cow. Other top breeds are 
Jersey, Brown Swiss, and Ayrshire cows. 


In Chapter 5, you will learn how to recognize normal 
(bell-shaped) distributions and how to use their properties 
in real-life applications. Suppose that you are a farmer 
planning to buy 20 Holstein cows and 10 Jersey cows from a 
breeder. You want to know the probabilities that the groups 
of cows will produce certain average daily amounts of milk. 
You will learn how to calculate this type of probability using 
a sampling distribution of sample means and the Central 
Limit Theorem in Section 5.4. The graphs below show the 
distributions of sample means of milk produced daily by 
the two breeds of cows. 


The table shows the information given to you by the 
breeder. Assume that the amounts of milk produced are 
normally distributed. 


Average Daily Milk 


Production by Holstein Cows 
A 
15+ 


= 


9+ 


Percent 


6+ 
ye 


/ ANS 
50 55 60 65 70 75 80 85 90 


Milk produced 
(in pounds) 


Amount of milk produced per day (in pounds) | 


Breed | Mean 
“Holstein = 6930 sd | 
| Jersey | 49.7 10.1 


Standard deviation 


You can use this information to make calculations about 
average amounts of milk produced daily by the cows. For 
instance, the probability that the 20 Holstein cows will 
produce an average of at least 65 pounds of milk per day is 
about 94.95%, and the probability that the 10 Jersey cows 
will produce an average of between 50 and 60 pounds of 
milk per day is about 46.35%. 


Average Daily Milk 
Production by Jersey Cows 


A 
14+ 


12+ 
10 + 
eo 
6s 
4+ 
Dele 


Percent 


35 40 45 50 55 60 65 70 


Milk produced 
(in pounds) 


255 


256 


What You Should Learn 


» How to interpret graphs of 
normal probability distributions 


~ How to find areas under the 
standard normal curve 


Study Tip 


A normal curve with mean 
pu and standard deviation o 
can be graphed using the 
normal probability density 
function 


(This formula will not be used in 
the text.) Because e ~ 2.718 and 
a ~ 3.14, a normal curve depends 
completely on w and o. 


Introduction to Normal Distributions and the 
Standard Normal Distribution 


CHAPTER 5 Normal Probability Distributions 


Properties of a Normal Distribution = The Standard Normal Distribution 


Properties of a Normal Distribution 


In Section 4.1, you distinguished between discrete and continuous random 
variables, and learned that a continuous random variable has an infinite number 
of possible values that can be represented by an interval on a number line. Its 
probability distribution is called a continuous probability distribution. In this 
chapter, you will study the most important continuous probability distribution 
in statistics—the normal distribution. Normal distributions can be used to model 
many sets of measurements in nature, industry, and business. For instance, the 
systolic blood pressures of humans, the lifetimes of smartphones, and housing 
costs are all normally distributed random variables. 


DEFINITION 


A normal distribution is a continuous probability distribution for a random 
variable x. The graph of a normal distribution is called the normal curve. 
A normal distribution has these properties. 

1. The mean, median, and mode are equal. 

2. The normal curve is bell-shaped and is symmetric about the mean. 

3. The total area under the normal curve is equal to 1. 
4 


. The normal curve approaches, but never touches, the x-axis as it extends 
farther and farther away from the mean. 


. Between 4 — o and w + oa (in the center of the curve), the graph curves 
downward. The graph curves upward to the left of 4 — o and to the right 
of x + a. The points at which the curve changes from curving upward to 
curving downward are called inflection points. 


Inflection points 


Total area = 1 


M+o wt+2o w+3o0 


uU-30 w-20 wW-o u 


You have learned that a discrete probability distribution can be graphed with 
a histogram. For a continuous probability distribution, you can use a probability 
density function (pdf). A probability density function has two requirements: 
(1) the total area under the curve is equal to 1, and (2) the function can never 
be negative. 


SECTION 5.1 Introduction to Normal Distributions and the Standard Normal Distribution 257 


LN 
ROY 
Magra 


ee) Picturing 
the World 


According to the National Center 
for Health Statistics, the number 
of births in the United States in 
a recent year was 3,978,497. The 
weights of the newborns can 

be approximated by a normal 
distribution, as shown in the 
figure. (Adapted from National Center 
for Health Statistics) 


Weights of Newborns 
T t t t t T x 
So So So = S So So 
i=) Co iS So S i=) Co 
wn = led se) an Wal = 
i: nN Nn foal isa) + vay 


Weight (in grams) 


What is the mean weight of 
the newborns? Estimate the 
standard deviation of this normal 
distribution. 


A normal distribution can have any mean and any positive standard 
deviation. These two parameters, w and a, determine the shape of the normal 
curve. The mean gives the location of the line of symmetry, and the standard 
deviation describes how much the data are spread out. 

For instance, in the figures below, curves A and B have the same mean, and 
curves B and C have the same standard deviation. The total area under each 
curve is 1. Also, in each graph, one of the inflection points occurs one standard 
deviation to the left of the mean, and the other occurs one standard deviation to 
the right of the mean. 


Cc 


Inflection B . 
: : Inflection 
Inflection points 
‘ points 
points A 
i 
: 
f—_+—_+—_}_+_+_+-+# x x x 
01234567 O° 4)-2 3: @ se 6 9 0123 4567 
Mean: pw = 3.5 Mean: uw = 3.5 Mean: pw = 1.5 
Standard deviation: Standard deviation: Standard deviation: 
ao = 1.5 ao = 0.7 ao = 0.7 


Understanding Mean and Standard Deviation 
1. Which normal curve has a greater mean? 


2. Which normal curve has a greater standard deviation? 


SOLUTION 


1. The line of symmetry of curve A occurs at x = 15. The line of symmetry of 
curve B occurs at x = 12. So, curve A has a greater mean. 


2. Curve B is more spread out than curve A. So, curve B has a greater standard 
deviation. 


TRY IT YOURSELF 1 
1. Which normal curve has the greatest mean? 


2. Which normal curve has the greatest standard deviation? 


30 40 «#450 «=660~— 70 Answer: Page A35 


258 


- 


CHAPTER 5 Normal Probability Distributions 


Tech Tip 


You can use technology 
to graph a normal 

' curve. For instance, you 
can use a [I-84 Plus to 
graph the normal curve 
in Example 2. 


Meee Flotz Fist 


Interpreting Graphs of Normal Distributions 


The scaled test scores for the New York State Grade 4 Common Core 
Mathematics Test are normally distributed. The normal curve shown below 
represents this distribution. What is the mean test score? Estimate the 
standard deviation of this normal distribution. (Adapted from New York State 
Education Department) 


Mathematics Test 


180 230 280 330 380 430 


Scaled test score 


SOLUTION 


Because a normal curve is Because the inflection points 

symmetric about the mean, you are one standard deviation from 
can estimate that pl = 305. the mean, you can estimate that 
o = 40. 


180 230 280 330 380 430 


Scaled test score 


The scaled test scores for the New York State Grade 4 Common Core 
Mathematics Test are normally distributed with a mean of about 305 and a 
standard deviation of about 40. 


Interpretation Using the Empirical Rule (see Section 2.4), you know that 
about 68% of the scores are between 265 and 345, about 95% of the scores 
are between 225 and 385, and about 99.7% of the scores are between 185 
and 425. 


TRY IT YOURSELF 2 


The scaled test scores for the New York State Grade 4 Common Core English 
Language Arts Test are normally distributed. The normal curve shown 
below represents this distribution. What is the mean test score? Estimate the 
standard deviation of this normal distribution. (Adapted from New York State 
Education Department) 


English Language Arts Test 


Scaled test score 


Answer: Page A35 


SECTION 5.1 Introduction to Normal Distributions and the Standard Normal Distribution 259 


Study Tip 


Because every normal 
distribution can be 
transformed to the standard 
normal distribution, you can use 
z-scores and the standard normal 
curve to find areas (and therefore 
probabilities) under any normal curve. 


* Study Tip 

It is important that you 

know the difference 

between x and z. The 

random variable x is 

sometimes called a raw 

score and represents values 
in a nonstandard normal distribution, 
whereas z represents values in the 
standard normal distribution. 


The Standard Normal Distribution 


There are infinitely many normal distributions, each with its own mean and 
standard deviation. The normal distribution with a mean of 0 and a standard 
deviation of 1 is called the standard normal distribution. The horizontal scale 
of the graph of the standard normal distribution corresponds to z-scores. 
In Section 2.5, you learned that a z-score is a measure of position that indicates 
the number of standard deviations a value lies from the mean. Recall that you 
can transform an x-value to a z-score using the formula 
_ Value — Mean 
* ~ Standard deviation 
xp 


= Round to the nearest hundredth. 


DEFINITION 


The standard normal distribution is a normal distribution with a mean of 0 
and a standard deviation of 1. The total area under its normal curve is 1. 


-1 0 1 2 


Standard Normal Distribution 


When each data value of a normally distributed random variable x is 
transformed into a z-score, the result will be the standard normal distribution. 
After this transformation takes place, the area that falls in the interval under the 
nonstandard normal curve is the same as that under the standard normal curve 
within the corresponding z-boundaries. 

In Section 2.4, you learned to use the Empirical Rule to approximate areas 
under a normal curve when the values of the random variable x corresponded to 
—3, —2, -1, 0, 1, 2, or 3 standard deviations from the mean. Now, you will learn 
to calculate areas corresponding to other x-values. After you use the formula 
above to transform an x-value to a z-score, you can use the Standard Normal 
Table (Table 4 in Appendix B). The table lists the cumulative area under the 
standard normal curve to the left of z for z-scores from —3.49 to 3.49. As you 
examine the table, notice the following. 


Properties of the Standard Normal Distribution 


The cumulative area is close to 0 for z-scores close to z 
The cumulative area increases as the z-scores increase. 


The cumulative area for z = 0 is 0.5000. 


The cumulative area is close to 1 for z-scores close to z 


In addition to using the table, you can use technology to find the cumulative 
area that corresponds to a z-score. For instance, the next example shows how to 
use the Standard Normal Table and a TI-84 Plus to find the cumulative area that 
corresponds to a z-score. 


260 CHAPTER 5. Normal Probability Distributions 


Using the Standard Normal Table 


1. Find the cumulative area that corresponds to a z-score of 1.15. 


2. Find the cumulative area that corresponds to a z-score of —0.24. 


SOLUTION 

1. Find the area that corresponds to z = 1.15 by finding 1.1 in the left column 
and then moving across the row to the column under 0.05. The number in 
that row and column is 0.8749. So, the area to the left of z = 1.15 is 0.8749, 
as shown in the figure at the left. 


.00 01 02 

0.0 5000 .5040 5080 5120  .5160 5239 

0.1 5398 5438 5478 5517. 5557 5636 

0.2 5793 5832 «5871 —«.5910~Ss «5948 .6026 

18315 

18554 

TI-84 PLUS .8770 
: 8849 .8869 8888 8907 .8925 8944 .8962 

ene, _ 1Ae8e 1.3 9032 .9049 .9066 .9082 .9099 .9115  .9131 
21.15; 1.4 9192 .9207 9222 9236 .9251 .9265 .9279 


. S74a2ee1 i4 


You can use technology to find the cumulative area that corresponds to 
z = 1.15, as shown at the left. Note that to specify the lower bound, use 
—10,000. 


2. Find the area that corresponds to z = —0.24 by finding —0.2 in the left 
column and then moving across the row to the column under 0.04. The 
number in that row and column is 0.4052. So, the area to the left of 
Zz = —0.24 is 0.4052, as shown in the figure at the left. 


z .09 .08 .07 .06 .05 .03 

—3.4 0002 .0003 .0003 .0003 #.0003 .0003 

—3.3 0003 .0004 .0004 .0004 #.0004 .0004 

-3.2 0005 .0005 .0005 .0006 .0006 .0006 

—0.5 .2981 

—0.4 3336 

TI-84 PLUS —0.3 3707 

70.25 4090 

ner pg teat 7 1BBHE -0.1 | .4247 4286 .4325 4364 4404 4443 4483 

" 465165175 —0.0 4641 —.4681 4721 .4761 .4801 .4840  .4880 
You can use technology to find the cumulative area that corresponds to 
z = —0.24, as shown at the left. Note that to specify the lower bound, use 

—10,000. 


TRY IT YOURSELF 3 
1. Find the cumulative area that corresponds to a z-score of —2.19. 


2. Find the cumulative area that corresponds to a z-score of 2.17. 
Answer: Page A35 


When the z-score is not in the table, use the entry closest to it. For a z-score 
that is exactly midway between two z-scores, use the area midway between the 
corresponding areas. 


SECTION 5.1 


Tech Tip 


You can use technology 
to find the area under 
the standard normal 
curve. For instance, you 
can use the ShadeNorm 
feature on a TI-84 Plus 
to graph the area under the 
standard normal curve between 
z= —0.75 and z = 1.23, as shown 
below. The area between the 
two zscores is shown below the 
graph. (Note that when you use 
technology, your answers may differ 
slightly from those found using the 
Standard Normal Table.) 


Ar@0=,6b40e4 
Tos 7.75 


luF=1.22 


Introduction to Normal Distributions and the Standard Normal Distribution 261 


You can use the following guidelines to find various types of areas under the 


standard normal curve. 


GUIDELINES 


Finding Areas Under the Standard Normal Curve 

1. Sketch the standard normal curve and shade the appropriate area under 
the curve. 

2. Find the area by following the directions for each case shown. 


a. To find the area to the /eft of z, find the area that corresponds to z in 
the Standard Normal Table. 


2. The area to the 
left of z = 1.23 is 
0.8907. 


1. Use the table to 0 


find the area for the z-scoré. 


b. To find the area to the right of z, use the Standard Normal Table to 
find the area that corresponds to z. Then subtract the area from 1. 


3. Subtract to find the area 
to the right of z = 1.23: 
1 — 0.8907 = 0.1093. 


2. The area to the left 
of z = 1.23 is 0.8907 


1. Use the table to 
find the area for the z-scoré. 


c. To find the area between two z-scores, find the area corresponding to 
each z-score in the Standard Normal Table. Then subtract the smaller 
area from the larger area. 


2. The area to the left 
of z = 1.23 is 0.8907 


4. Subtract to find the area 
of the region between the 


two z-scores: 
0.8907 — 0.2266 = 0.6641. 


3. The area to the left 
of z =—0.75 is 0.2266. 


-0.75 0 1.23 


\ 
1. Use the table to find 
the areas for the z-scores. 


262 


CHAPTER 5 Normal Probability Distributions 


1-NORM.S.DIST(1.O6, TRUE) 


0.1445723 


Finding Area Under the Standard Normal Curve 


Find the area under the standard normal curve to the left of z = —0.99. 
SOLUTION 
The area under the standard normal curve to the left of z = —0.99 is shown. 


Area = 0.1611 


-0.99 0 


From the Standard Normal Table, this area is equal to 


0.1611. Area to the left of z = —0.99 
You can use technology to find the area to the left of z = —0.99, as shown 
below. 


B 
4 |NORM.S.DIST(-0.99, TRUE) 
2 | 0.16108706 


TRY IT YOURSELF 4 


Find the area under the standard normal curve to the left of z = 2.13. 
Answer: Page A35 


Finding Area Under the Standard Normal Curve 
Find the area under the standard normal curve to the right of z = 1.06. 


SOLUTION 
The area under the standard normal curve to the right of z = 1.06 is shown. 


Area = 0.8554 Area = 1 — 0.8554 


0 1.06 

From the Standard Normal Table, the area to the left of z = 1.06 is 0.8554. 

Because the total area under the curve is 1, the area to the right of z = 1.06 is 
Area = 1 — 0.8554 = 0.1446. 

You can use technology to find the area to the right of z = 1.06, as shown at 

the left. 


TRY IT YOURSELF 5 


Find the area under the standard normal curve to the right of z = —2.16. 
Answer: Page A35 


SECTION 5.1 Introduction to Normal Distributions and the Standard Normal Distribution 263 


Finding Area Under the Standard Normal Curve 


Find the area under the standard normal curve between z = —1.5 and 
z= 1.25. 

SOLUTION 

The area under the standard normal curve between z = —1.5 and z = 1.25 
is shown. 


\ Area = 0.8944 — 0.0668 


Area = 0.0668 


=L5 0 1.25 


From the Standard Normal Table, the area to the left of z = 1.25 is 0.8944 and 
the area to the left of z = —1.5 is 0.0668. So, the area between z = —1.5 and 
z= 1.25 is 


Area = 0.8944 — 0.0668 = 0.8276. 


Note that when you use technology, your 
answers may differ slightly from those found TI-84 PLUS 
using the Standard Normal Table. For instance, |nrormalodf¢-1.5,1 
when finding the area on a TI-84 Plus, you get 2207812 

the result shown at the right. BLP Id 29525 


Interpretation So, 82.76% of the area under 
the curve falls between z = —1.5 and z = 1.25. 


TRY IT YOURSELF 6 


Find the area under the standard normal curve between z = —2.165 and 
z= —1.35. 
Answer: Page A35 


Because the normal distribution is a continuous probability distribution, the 
area under the standard normal curve to the left of a z-score gives the probability 
that z is less than that z-score. For instance, in Example 4, the area to the left of 
z = —0.99 is 0.1611. So, 


P(z < —0.99) = 0.1611 


which is read as “the probability that z is less than —0.99 is 0.1611.” The table 
shows the probabilities for Examples 5 and 6. (You will learn more about finding 
probabilities in the next section.) 


Area Probability 
Example 5 To the right of z = 1.06: 0.1446 P(z > 1.06) = 0.1446 
Example 6 Between z = —1.5 and z = 1.25:0.8276 | P(—1.5 < z < 1.25) = 0.8276 
Recall from Section 2.4 that values lying more than two standard deviations 
from the mean are considered unusual. Values lying more than three standard 
deviations from the mean are considered very unusual. So, a z-score greater 


than 2 or less than —2 is unusual. A z-score greater than 3 or less than —3 is 
very unusual. 


264 CHAPTER 5. Normal Probability Distributions 


b.1 EXERCISES 


For Extra Help: MyLab Statistics 


Building Basic Skills and Vocabulary 


1. 


10. 


Find three real-life examples of a continuous variable. Which do you think 
may be normally distributed? Why? 


. In anormal distribution, which is greater, the mean or the median? Explain. 
. What is the total area under the normal curve? 


. What do the inflection points on a normal distribution represent? Where do 


they occur? 


. Draw two normal curves that have the same mean but different standard 


deviations. Describe the similarities and differences. 


. Draw two normal curves that have different means but the same standard 


deviation. Describe the similarities and differences. 


- What is the mean of the standard normal distribution? What is the standard 


deviation of the standard normal distribution? 


. Describe how you can transform a nonstandard normal distribution to the 


standard normal distribution. 


. Getting at the Concept Why is it correct to say “a” normal distribution and 


“the” standard normal distribution? 


Getting at the Concept A z-score is 0. Which of these statements must be 
true? Explain your reasoning. 

(a) The mean is 0. 

(b) The corresponding x-value is 0. 

(c) The corresponding x-value is equal to the mean. 


Graphical Analysis Jn Exercises 11-16, determine whether the graph could 
represent a variable with a normal distribution. Explain your reasoning. If the graph 
appears to represent a normal distribution, estimate the mean and standard deviation. 


11. 


15. 


12. 
oo aS a a Ht tt ts 
45 46 47 48 49 50 51 52 15 16 17 18 19 20 21 22 
14. 
as an SS 
10 11 12 13 14 15 16 17 
16. 


= —  — aes > x 


i i i 1 1 T 
T T T T T T 
8 9 10 11 12 13 4 15 12 13 14 15 16 17 18 19 


SECTION 5.1 Introduction to Normal Distributions and the Standard Normal Distribution 265 


Using and Interpreting Concepts 


Finding Area Jn Exercises 17-22, find the area of the shaded region under the 
standard normal curve. If convenient, use technology to find the area. 


17. 


21. 


18. 
z z 
-1.5 0 
20. 
z z 
0 2 
22. 
F z Zz 
-0.7 0 1.2 —2.25 0 


Finding Area Jn Exercises 23-36, find the indicated area under the standard 
normal curve. If convenient, use technology to find the area. 


23. 
25. 
27. 
29. 
31. 
33. 
34. 
35. 
36. 


37. 


To the left of z = 0.26 24. To the left of z = —2.87 

To the left of z = —1.865 26. To the left of z = 1.185 

To the right of z = —0.99 28. To the right of z = 2.65 

To the right of z = —0.955 30. To the right of z = 1.175 
Between z = 0 and z = 2.86 32. Between z = —2.48 and z = 0 
Between z = —2.25 and z = 2.25 


Between z = —1.96 and z = 1.96 
—0.15 and to the right of z = 0.15 
To the left of z = —2.56 and to the right of z = 1.25 


To the left of z 


Manufacturer Claims You work for a consumer watchdog publication and 
are testing the advertising claims of a flash drive. The manufacturer claims 
that the life spans of the flash drives are normally distributed, with a mean 
of 80,000 runs and a standard deviation of 5000 runs. You test 14 flash drives 
and record the life spans shown below. 


81,000 79,686 45,386 53,196 78,484 76,584 87,812 
62,382 39,354 89,998 80,002 80,896 51,422 71,496 


(a) Draw a frequency histogram to display these data. Use five classes. Do 
the life spans appear to be normally distributed? Explain. 
(b) Find the mean and standard deviation of your sample. 


(c) Compare the mean and standard deviation of your sample with those in 
the manufacturer claim. Discuss the differences. 


266 


CHAPTER 5_ Normal Probability Distributions 


ad} 38. Cocoa Consumption You are performing a study about monthly per 
capita cocoa consumption in a city. A previous study found monthly per 
capita cocoa consumption to be normally distributed, with a mean of 
21 grams and a standard deviation of 4.5 grams. You randomly sample 
30 people and record the monthly cocoa consumptions shown below. 


9 17 11 20 18 22 29 21 25 16 23 21 12 21 25 
16 14 21 23 20 20 16 24 26 14 22 15 18 19 26 


(a) Draw a frequency histogram to display these data. Use seven classes. 
Do the consumptions appear to be normally distributed? Explain. 


(b) Find the mean and standard deviation of your sample. 


(c) Compare the mean and standard deviation of your sample with 
those of the previous study. Discuss the differences. 


Computing and Interpreting z-Scores Jn Exercises 39 and 40, (a) find 
the z-score that corresponds to each value and (b) determine whether any of the 
values are unusual. 


39. Girlsin ATAR The test scores for the girls who received their Australian 
Tertiary Admission Rank (ATAR) are normally distributed. In a recent 
year, the mean test score for the girls was 66.25 and the standard deviation 
was 2.50. The test scores of four girls selected at random are 66.50, 69.75, 
72.50, and 60.75. (Source: Herald Sun) 


40. Boysin ATAR The test scores for the boys who received their Australian 
Tertiary Admission Rank (ATAR) are normally distributed. In a recent 
year, the mean test score for the boys was 63.75 and the standard deviation 
was 1.75. The test scores of four boys selected at random are 67.50, 66.50, 
61.25, and 63.75. (Source: Herald Sun) 


Finding Probability in Exercises 41-46, find the probability of z occurring 
in the shaded region of the standard normal distribution. If convenient, use 
technology to find the probability. 


41. 42. 
Zz z 
43. 44. 
Zz a 
2.125 0 0 1.28 
46. 


0 1.68 


SECTION 5.1 Introduction to Normal Distributions and the Standard Normal Distribution 267 


Finding Probability Jn Exercises 47-56, find the indicated probability using the 
standard normal distribution. If convenient, use technology to find the probability. 


47. P(z < 2.32) 48. P(z < —0.26) 49. P(z > 3.285) 

50. P(z > —2.55) 51. P(-1.35 < z <0) 52. P(0 < z < 2.315) 
53. P(—2.96 < z < 2.96) 54. P(—0.18 < z < 0.18) 

55. P(z < —3.1 orz > 3.1) 56. P(z < —2.46 or z > 2.46) 


Extending Concepts 


57. Writing Draw a normal curve with a mean of 60 and a standard deviation 
of 12. Describe how you constructed the curve and discuss its features. 


58. Writing Draw anormal curve with a mean of 450 and a standard deviation 
of 50. Describe how you constructed the curve and discuss its features. 


Uniform Distribution A uniform distribution is a continuous probability 
distribution for a random variable x between two values a and b (a < b), where 
a =x = band all of the values of x are equally likely to occur. The graph of a 
uniform distribution is shown below. 


y 
A 


S 
Lhe 
Q 


a b 


The probability density function of a uniform distribution is 
— il 
b-a 


on the interval from x = a to x = b. For any value of x less than a or greater 
than b, y = 0. In Exercises 59 and 60, use this information. 


y 


59. Show that the probability density function of a uniform distribution satisfies 
the two conditions for a probability density function. 


60. For two values c and d, where a = c < d = b, the probability that x lies between 
c and d is equal to the area under the curve between c and d, as shown below. 


y 


BR 


La 
I 
a 


a ia d b 


So, the area of the red region equals the probability that x lies between c and d. 
For a uniform distribution from a = 1 to b = 25, find the probability that 
(a) x lies between 2 and 8. 

(b) x lies between 4 and 12. 

(c) x lies between 5 and 17. 

(d) x lies between 8 and 14. 


268 CHAPTER 5. Normal Probability Distributions 


wwam llormal Distributions: Finding Probabilities 


What You Should Learn Probability and Normal Distributions 

» How to find probabilities for bes , ‘ F 
normally distributed variables Probability and Normal Distributions 
using a table and using 


When a random variable x is normally distributed, you can find the probability 
that x will lie in an interval by calculating the area under the normal curve for 
the interval. To find the area under any normal curve, first convert the upper 
and lower bounds of the interval to z-scores. Then use the standard normal 
distribution to find the area. For instance, consider a normal curve with w = 500 
and a0 = 100, as shown at the upper left. The value of x one standard deviation 
above the mean is w + o = 500 + 100 = 600. Now consider the standard 
normal curve shown at the lower left. The value of z one standard deviation 
above the mean is » + 0 = 0 + 1 = 1. Because a z-score of 1 corresponds to 
an x-value of 600, and areas are not changed with a transformation to a standard 
normal curve, the shaded areas in the figures at the left are equal. 


technology 


00 500 600 700 800 


200 300 4 


Same 


Finding Probabilities for Normal Distributions 


A national study found that college students with jobs worked an average of 
22 hours per week. The standard deviation is 9 hours. A college student with a 
job is selected at random. Find the probability that the student works for less 
than 4 hours per week. Assume that the lengths of time college students work 
are normally distributed and are represented by the variable x. (Adapted from 
Sallie Mae/Ipsos Public Affairs) 


SOLUTION 


The figure shows a normal curve with = 22, 
o = 9, and the shaded area for x less than 4. 
The z-score that corresponds to 4 hours is 


X-— pb 4-22 © 


To learn how to Z= =2, 

, (on 9 
determine whether 2 
a random sample is The Standard Normal Table shows that 0 8 16 28 36 44 
taken from a normal P(z < -2) = 0.0228 Hours worked 
distribution, see = : . 
Appendix C. The probability that the student works for less that 4 hours per week is 0.0228. 


Interpretation So, 2.28% of college students with jobs worked for less than 
4 hours per week. Because 2.28% is less than 5%, this is an unusual event. 


TRY IT YOURSELF 1 


The average speed of vehicles traveling on a stretch of highway is 67 miles per 
hour with a standard deviation of 3.5 miles per hour. A vehicle is selected at 
random. What is the probability that it is violating the speed limit of 70 miles 
per hour? Assume the speeds are normally distributed and are represented by 
the variable x. 

Answer: Page A35 


In Example 1, because P(z < —2) = P(x < 4), another way to write the 
probability is P(x < 4) = 0.0228. 


10 20 


10 20 


30 40 50 60 
Time (in minutes) 


30 40 50 60 
Time (in minutes) 


70 80 


70 80 


SECTION 5.2 Normal Distributions: Finding Probabilities 269 


Finding Probabilities for Normal Distributions 


A survey indicates that for each trip to a supermarket, a shopper spends an 
average of 43 minutes with a standard deviation of 12 minutes in the store. The 
lengths of time spent in the store are normally distributed and are represented 
by the variable x. A shopper enters the store. (a) Find the probability that the 
shopper will be in the store for each interval of time listed below. (b) When 
200 shoppers enter the store, how many shoppers would you expect to be in 
the store for each interval of time listed below? (Adapted from Time Use Institute) 


1. Between 22 and 52 minutes 
2. More than 37 minutes 


SOLUTION 


1. (a) The figure at the left shows a normal curve with w = 43 minutes, 
o = 12 minutes, and the shaded area for x between 22 and 52 minutes. 
The z-scores that correspond to 22 minutes and to 52 minutes are 


22 — 43 52 — 43 
— = -1, d = —_ = 0.75. 
D 75 and Zz D 0.7 


So, the probability that a shopper will be in the store between 22 and 
52 minutes is 


P(22 <x < 52) = P(-1.75 < z < 0.75) 
= P(z < 0.75) — P(z < -1.75) 
= 0.7734 — 0.0401 
= 0.7333. 


‘A 


(b 


wa 


Interpretation When 200 shoppers enter the store, you would expect 
about 200(0.7333) = 146.66 ~ 147 shoppers to be in the store between 
22 and 52 minutes. 


2. (a) The figure at the left shows a normal curve with w = 43 minutes, 
o = 12 minutes, and the shaded area for x greater than 37. The z-score 
that corresponds to 37 minutes is 


37 — 43 
Z = = 


—0.5. 
12 


So, the probability that a shopper will be in the store more than 
37 minutes is 
P(x > 37) = P(z >—-0:5) 
=1-P(z<-—05) 
= 1 — 0.3085 
= 0.6915. 
(b 


wa 


Interpretation When 200 shoppers enter the store, you would expect 
about 200(0.6915) = 138.3 ~ 138 shoppers to be in the store more 
than 37 minutes. 


TRY IT YOURSELF 2 


What is the probability that the shopper in Example 2 will be in the supermarket 

between 31 and 58 minutes? When 200 shoppers enter the store, how many 

shoppers would you expect to be in the store between 31 and 58 minutes? 
Answer: Page A35 


270 CHAPTER 5. Normal Probability Distributions 


LN 
Mees 


oe) Picturing 
the World 


In baseball, a batting average is 
the number of hits divided by the 
number of at bats. The batting 
averages of all Major League 
Baseball players in a recent year 
can be approximated by a normal 
distribution, as shown in the 
figure. The mean of the batting 
averages is 0.255 and the standard 
deviation is 0.010. (Source: Major 
League Baseball) 


& 


Major League Baseball 
w= 0.255 


T T | oa T T T 
0.23 0.24 0.25 0.26 0.27 0.28 
Batting average 


What percent of the players 
have a batting average of 0.270 
or greater? Out of 40 players on 
a roster, how many would you 
expect to have a batting average 
of 0.270 or greater? 


Another way to find normal probabilities is to use technology. You can find 


normal probabilities using Minitab, Excel, StatCrunch, and the TI-84 Plus. 


Using Technology to Find Normal Probabilities 


Triglycerides are a type of fat in the bloodstream. The mean triglyceride level 
for U.S. adults (ages 20 and older) is 97 milligrams per deciliter. Assume the 
triglyceride levels of U.S. adults who are at least 20 years old are normally 
distributed, with a standard deviation of 25 milligrams per deciliter. You 
randomly select a U.S. adult who is at least 20 years old. What is the probability 
that the person’s triglyceride level is less than 100? (Triglyceride levels less 
than 150 milligrams per deciliter are considered normal.) Use technology to 
find the probability. (Adapted from JAMA Cardiology) 


SOLUTION 

Minitab, Excel, StatCrunch, and the TI-84 Plus each have features that allow 
you to find normal probabilities without first converting to standard z-scores. 
Note that to use these features, you must specify the mean and standard 
deviation of the population, as well as any x-values that determine the interval. 
You are given that ~« = 97 and o = 25, and you want to find the probability 
that the person’s triglyceride level is less than 100, or P(x < 100). 


MINITAB 


Cumulative Distribution Function 


Normal with mean = 97 and standard deviation = 25 


x P{X S x) 
100 0.547758 


Es 
| 2 | 0.547758426 
STATCRUNCH 


Normal Distribution 


TI-84 PLUS 


normalcdf(-1 0000, 100, 
97, 25) 


Mean: 97 Std. Dev.: 25 
P(x = 100) = 0.54775843 


.047758471 


From the displays, you can see that P(x < 100) ~ 0.548. 


Interpretation The probability that the person’s triglyceride level is less than 
100 is about 0.548, or 54.8%. 


TRY IT YOURSELF 3 


A US. adult who is at least 20 years old is selected at random. What is the 
probability that the person’s triglyceride level is between 100 and 150? Use 
technology to find the probability. 

Answer: Page A35 


SECTION 5.2 Normal Distributions: Finding Probabilities 271 


52 [ X E A hk | NN [ iN For Extra Help: MyLab Statistics 


Building Basic Skills and Vocabulary 


Computing Probabilities for Normal Distributions Jn Exercises 1-6, 
the random variable x is normally distributed with mean = 174 and standard 
deviation 0 = 20. Find the indicated probability. 


1. P(x < 180) 2. P(x < 160) 3. P(x > 185) 
4. P(x > 170) 5. P(170 < x < 195) 6. P(155 < x < 172) 


Using and Interpreting Concepts 


Finding Probabilities for Normal Distributions Jn Exercises 7-12, find 
the indicated probabilities. If convenient, use technology to find the probabilities. 


7. World Happiness In a recent study on world happiness, participants were 
asked to evaluate their current lives on a scale from 0 to 10, where 0 represents 
the worst possible life and 10 represents the best possible life. The responses 
were normally distributed, with a mean of 5.4 and a standard deviation of 2.2. 
Find the probability that a randomly selected study participant’s response 
was (a) less than 4, (b) between 4 and 6, and (c) more than 8. Identify any 
unusual events in parts (a)-(c). Explain your reasoning. (Source: The Earth 
Institute, Columbia University) 


8. Incomes of CEOs Ina survey of South African CEOs, the annual incomes 
were normally distributed, with a mean of $7.14 million and a standard 
deviation of $0.5 million. Find the probability that a randomly selected CEO 
has an annual income that is (a) less than $6.5 million, (b) between $7 million 
and $7.5 million, and (c) more than 8 million. Identify any unusual events in 
parts (a)-(c). Explain your reasoning. (Adapted from Fin24) 


9. Ages of Prime Ministers The ages of prime ministers of Sri Lanka, when 
they were first sworn to office, are normally distributed with a mean of 
58.21 years and a standard deviation of 11.56 years. Find the probability that 
the age of a randomly selected prime minister of Sri Lanka when he was 
sworn to office was (a) less than 50 years, (b) between 55 and 60 years, and 
(c) more than 70 years. Identify the unusual events in parts (a)-(c). Explain 
your reasoning. (Source: Prime Minister’s Office of the Democratic Socialist 
Republic of Sri Lanka) 


10. MCAT Scores Ina recent year, the MCAT scores for the critical analysis 
and reasoning skills portion of the test were normally distributed, with a 
mean of 124.9 and a standard deviation of 3.0. Find the probability that 
a randomly selected medical student who took the MCAT has a critical 
analysis and reasoning skills score that is (a) less than 120, (b) between 122 
and 128, and (c) more than 130. Identify any unusual events in parts (a)-(c). 
Explain your reasoning. (Source: Association of American Medical Colleges) 


11. Maintenance Charges The monthly maintenance charges of an organization 
are normally distributed, with a mean of $500 and a standard deviation of 
$150. Find the probability that a randomly selected utility bill is (a) less than 
$200, (b) between $300 and $600, and (c) more than $800. 


12. Gym Schedule The amounts of time per workout a person uses an 
elliptical cross trainer are normally distributed, with a mean of 10 minutes 
and a standard deviation of 2 minutes. Find the probability that a randomly 
selected worker uses a cross trainer for (a) less than 7 minutes, (b) between 
10 and 15 minutes, (c) more than 16 minutes. 


272 


CHAPTER 5 Normal Probability Distributions 


Graphical Analysis In Exercises 13-16, a member is selected at random from 
the population represented by the graph. Find the probability that the member 
selected at random is from the shaded region of the graph. Assume the variable x 
is normally distributed. 


13. 


15. 


SAT Total Scores 14. ACT Composite Scores 
750 <x < 1000 W<x< 33 
w= 1083 
o= 193 
~< \ | x 
750 1000 1500 
Score Score 
(Source: The College Board) (Source: ACT, Inc.) 
Pregnancy Length in a 16. Red Blood Cell Count ina 
Population of New Mothers Population of Adult Males 
285 < x < 294 45<x<5.5 


240 285 294 4.5 35 6.5 
Pregnancy length (in days) Count (in million cells/microliter) 


Using Normal Distributions Jn Exercises 17-20, answer the questions 
about the specified normal distribution. 


17. 


18. 


19. 


20. 


SAT Total Scores Use the normal distribution in Exercise 13. 

(a) What percent of the SAT total scores are less than 1200? 

(b) Out of 500 randomly selected SAT total scores, about how many would 
you expect to be greater than 1300? 

ACT Composite Scores Use the normal distribution in Exercise 14. 

(a) What percent of the ACT composite scores are less than 16? 

(b) Out of 1000 randomly selected ACT composite scores, about how many 
would you expect to be greater than 23? 

Pregnancy Length Use the normal distribution in Exercise 15. 


(a) What percent of the new mothers had a pregnancy length of less than 
280 days? 

(b) What percent of the new mothers had a pregnancy length of between 
250 and 275 days? 

(c) Out of 500 randomly selected new mothers, about how many would you 
expect to have had a pregnancy length of greater than 290 days? 


Red Blood Cell Count Use the normal distribution in Exercise 16. 


(a) What percent of the adult males have a red blood cell count less than 
5.75 million cells per microliter? 


(b) What percent of the adult males have a red blood cell count between 
4.75 and 5.25 million cells per microliter? 


(c) Out of 500 randomly selected adult males, about how many would you 
expect to have a red blood cell count greater than 5.0 million cells per 
microliter? 


SECTION 5.2 Normal Distributions: Finding Probabilities 273 


Extending Concepts 


Control Charts Statistical process control (SPC) is the use of statistics to 
monitor and improve the quality of a process, such as manufacturing an engine 
part. In SPC, information about a process is gathered and used to determine 
whether a process is meeting all of the specified requirements. One tool used 
in SPC is a control chart. When individual measurements of a variable x are 
normally distributed, a control chart can be used to detect processes that are 
possibly out of statistical control. Three warning signals that a control chart uses 
to detect a process that may be out of control are listed below. 


(1) A point lies beyond three standard deviations of the mean. 
(2) There are nine consecutive points that fall on one side of the mean. 


(3) At least two of three consecutive points lie more than two standard deviations 
from the mean. 


In Exercises 21-24, a control chart is shown. Each chart has horizontal lines 
drawn at the mean p, at w + 20, and at w + 30. Determine whether the process 
shown is in control or out of control. Explain. 


21. A gear has been designed to 22. A nail has been designed to 


23. 


have a diameter of 3 inches. The 
standard deviation of the process 
is 0.2 inch. 


have a length of 4 inches. The 
standard deviation of the process 
is 0.12 inch. 


Gears ; Nails 

ZG 4 4504 
oO n 

| re 

£3 g 

& = 

5 2 = 

Ss a 

I 2 

eel 4 

fa) 


Observation number 


A liquid-dispensing machine has 
been designed to fill bottles with 
1 liter of liquid. The standard 
deviation of the process is 
0.1 liter. 


24. 


t—++++++++++-> 
2 4 6 8 10 12 
Observation number 


An engine part has been 
designed to have a diameter of 
55 millimeters. The standard 
deviation of the process is 
0.001 millimeter. 


Liquid Dispenser Engine Part 
a © 55.0050 
15+ Denes 
— o 
& = 
=| BH 5500s5+ OSSCSsCi‘“‘é‘;:*W 
: : 
7) = 55.0000 
> al 
3 8 
z B 549975 
3 5 
5 | | | | i i i i i i i a | | |e oe Ge ee 


Observation number 


+ T | T T T T T T T T 
2 4 6 8 10 12 
Observation number 


274 CHAPTER 5. Normal Probability Distributions 


53 Normal Distributions: Finding Values 


What You Should Learn 
» How to find a z-score given the 
area under the normal curve 


~ How to transform a z-score to 
an x-value 


~ How to find a specific data 
value of a normal distribution 
given the probability 


Finding z-Scores m Transforming a z-Score to an xValue m Finding a 
Specific Data Value for a Given Probability 


Finding z-Scores 


In Section 5.2, you were given a normally distributed random variable x and you 
found the probability that x would lie in an interval by calculating the area under 
the normal curve for the interval. 

But what if you are given a probability and want to find a value? For 
instance, a university might want to know the lowest test score a student can have 
on an entrance exam and still be in the top 10%, or a medical researcher might 
want to know the cutoff values for selecting the middle 90% of patients by age. 
In this section, you will learn how to find a value given an area under a normal 
curve (or a probability), as shown in the next example. 


Finding a z-Score Given an Area 
1. Find the z-score that corresponds to a cumulative area of 0.3632. 


2. Find the z-score that has 10.75% of the distribution’s area to its right. 


SOLUTION 


1. Find the z-score that corresponds to an area of 0.3632 by locating 0.3632 
in the Standard Normal Table. The values at the beginning of the 
corresponding row and at the top of the corresponding column give the 
z-score. For this area, the row value is —0.3 and the column value is 0.05. So, 
the z-score is —0.35, as shown in the figure at the left. 


4 .09 .08 .07 .06 04 .03 
-3.4 0002 .0003 .0003 .0003 0003 .0003 
.2810  .2843 .2946 = .2981 
3156  .3192 -3300 .3336 
3669 .3707 
3859 §=.3897 = .3936 ~—«.3974 .4013 4052 .4090 


2. Because the area to the right is 0.1075, the cumulative area is 
1 — 0.1075 = 0.8925. Find the z-score that corresponds to an area of 0.8925 
by locating 0.8925 in the Standard Normal Table. For this area, the row 
value is 1.2 and the column value is 0.04. So, the z-score is 1.24, as shown in 
the figure at the left. 


z .00 .01 02 .03 .05 -06 
0.0 5000 .5040 .5080 «5120 5199 = .5239 
1.0 8413 .8438 .8461 .8485 .8531 .8554 
1.1 8643 .8665 .8686 .8708 8749 —_.8770 

8944  .8962 
1.3 9032 .9049'-«.9066—«.9082,— 9099). 9115 9131 


SECTION 5.3 Normal Distributions: Finding Values 275 


TRY IT YOURSELF 1 
1. Find the z-score that has 96.16% of the distribution’s area to its right. 


2. Find the positive z-score for which 95% of the distribution’s area lies 
between —z and z. 


Tech Tip 


You can use technology 
to find the z-scores 
that correspond to 
cumulative areas. For 
instance, you can use 

a TI-84 Plus to find the 
z-scores in Example 1, as shown 
below. 


Answer: Page A35 


In Example 1, the given areas correspond to entries in the Standard Normal 
Table. In most cases, the area will not be an entry in the table. In these cases, use 
the entry closest to it (or use technology, as shown at the left and in Example 2). 
When the area is halfway between two area entries, use the z-score halfway 
between the corresponding z-scores. 


- 


a 


inulornt sess. Os In Section 2.5, you learned that percentiles divide a data set into 100 equal 
13 parts. To find a z-score that corresponds to a percentile, you can use the Standard 
ainsi 3 oce a. Normal Table. Recall that if a value x represents the 83rd percentile Pg3, then 
13 . — 83% of the data values are below x and 17% of the data values are above x. 


1. 239935475 


Finding a z-Score Given a Percentile 
Find the z-score that corresponds to each percentile. 
1. Ps 2. P59 3. Poo 


SOLUTION 


1. To find the z-score that corresponds to Ps, find the z-score that corresponds 
to an area of 0.05 (see upper figure) by locating 0.05 in the Standard Normal 
Table. The areas closest to 0.05 in the table are 0.0495 (z = —1.65) and 
0.0505 (z = —1.64). Because 0.05 is halfway between the two areas in the 
table, use the z-score that is halfway between —1.64 and —1.65. So, the 

1.645 0 z-score that corresponds to an area of 0.05 is — 1.645. 


Area = 0.05 


2. To find the z-score that corresponds to P59, find the z-score that corresponds 
to an area of 0.5 (see middle figure) by locating 0.5 in the Standard Normal 
Table. The area closest to 0.5 in the table is 0.5000, so the z-score that 
corresponds to an area of 0.5 is 0. 


Area = 0.5 


3. To find the z-score that corresponds to Poo, find the z-score that corresponds 
z to an area of 0.9 (see lower figure) by locating 0.9 in the Standard Normal 
Table. The area closest to 0.9 in the table is 0.8997, so the z-score that 

corresponds to an area of 0.9 is about 1.28. 


areas eee You can use technology to find the z-score that corresponds to each percentile, 


as shown below. Remember that when you use technology, your answers may 
differ slightly from those found using the Standard Normal Table. 


=| B 
1.NORM.INV(O.05,0, 1) 
— = 1.644853627 
2.NORM.INV(O.50,0, 1) 


3.NORM.INV(O.90,0, 1) 
1.281551566 


oor] |ca|r0 3 


TRY IT YOURSELF 2 
Find the z-score that corresponds to each percentile. 


Answer: Page A35 


276 CHAPTER 5 Normal Probability Distributions 


—0.44 0 1.96 


14 | 
as TT T 


8.12 9 12.92 
Weight (in pounds) 


xX 


Transforming a z-Score to an x-Value 


Recall that to transform an x-value to a z-score, you can use the formula 
x— pe 
> 


This formula gives z in terms of x. When you solve this formula for x, you get a 
new formula that gives x in terms of z. 


Z= 


x KB ; 
Z= Formula for z in terms of x 
o 
Zo =X-p Multiply each side by o. 
w+ zo=x Add yp to each side. 
x=pt zo Interchange sides. 


Transforming a z-Score to an x-Value 


To transform a standard z-score to an x-value in a given population, use 
the formula 


X= pt Zo. 


Finding an x-Value Corresponding to a z-Score 


A veterinarian records the weights of cats treated at a clinic. The weights 
are normally distributed, with a mean of 9 pounds and a standard deviation 
of 2 pounds. Find the weight x corresponding to each z-score. Interpret the 
results. 


1. z = 1.96 2. z = —0.44 3.z=0 
SOLUTION 


The x-value that corresponds to each standard z-score is calculated using the 
formula x = uw + zo. Note that w = 9 anda = 2. 


1. For z = 1.96, the corresponding weight x is 
x = 9 + 1.96(2) = 12.92 pounds. 


2. For z = —0.44, the corresponding weight x is 
x =9 + (-0.44)(2) = 8.12 pounds. 

3. For z = 0, the corresponding weight x is 
x = 9 + 0(2) = 9 pounds. 


Interpretation From the figure at the left, you can see that 12.92 pounds is 
to the right of the mean, 8.12 pounds is to the left of the mean, and 9 pounds 
is equal to the mean. 


TRY IT YOURSELF 3 


A veterinarian records the weights of dogs treated at a clinic. The weights 
are normally distributed, with a mean of 52 pounds and a standard deviation 
of 15 pounds. Find the weight x corresponding to each z-score. Interpret the 
results. 


Lz = -2.33 2 7=3 3. z = 0.58 
Answer: Page A35 


aN 
ore. 
Megrere 


eee) Picturing 
the World 


Many investors choose mutual 
funds as a way to invest in the 
stock market. The mean annual 
rate of return for large growth 
mutual funds during a recent 
five-year period was about 
12.1% with a standard deviation 
of 1.8%. (Adapted from Morningstar) 


Annual Rate of Return for 
Large Growth Mutual Funds 


0.08 0.12 0.16 
Rate of return 


Between what two values does the 
middle 90% of the data lie? 


TI-84 PLUS 
invlornm. 4, o8,18 
62,81551567 


SECTION 5.3 Normal Distributions: Finding Values 277 


Finding a Specific Data Value for a 
Given Probability 


You can also use the normal distribution to find a specific data value (x-value) 
for a given probability, as shown in Examples 4 and 5. 


Finding a Specific Data Value 

Scores for the California Peace Officer Standards and Training test are 
normally distributed, with a mean of 50 and a standard deviation of 10. An 
agency will only hire applicants with scores in the top 10%. What is the lowest 
score an applicant can earn and still be eligible to be hired by the agency? 
(Source: State of California) 


SOLUTION 
Exam scores in the top 10% correspond to the shaded region shown. 


Scores for the California Peace 
Officer Standards and Training Test 


50 
Test score 


A test score in the top 10% is any score above the 90th percentile. To find 
the score that represents the 90th percentile, you must first find the z-score 
that corresponds to a cumulative area of 0.9. In the Standard Normal Table, 
the area closest to 0.9 is 0.8997. So, the z-score that corresponds to an area of 
0.9 is z = 1.28. To find the x-value, note that 4 = 50 and o = 10, and use the 
formula x = pw + zo, as shown. 


X= jb 20 
= 50 + 1.28(10) 
= 62.8 


You can check this answer using technology. For instance, you can use a 
TI-84 Plus to find the x-value, as shown at the left. 


Interpretation The lowest score an applicant can earn and still be eligible to 
be hired by the agency is about 63. 


TRY IT YOURSELF 4 


A researcher tests the braking distances of several cars. The braking distance 
from 60 miles per hour to a complete stop on dry pavement is measured in 
feet. The braking distances of a sample of cars are normally distributed, with 
a mean of 129 feet and a standard deviation of 5.18 feet. What is the longest 
braking distance one of these cars could have and still be in the bottom 1%? 
(Adapted from Consumer Reports) 

Answer: Page A35 


278 


CHAPTER 5 Normal Probability Distributions 


Finding a Specific Data Value 


In a randomly selected sample of women ages 20-34, the mean total 
cholesterol level is 179 milligrams per deciliter with a standard deviation of 
38.9 milligrams per deciliter. Assume the total cholesterol levels are normally 
distributed. Find the highest total cholesterol level a woman in this 20-34 age 
group can have and still be in the bottom 1%. (Adapted from National Center for 
Health Statistics) 


SOLUTION 
Total cholesterol levels in the lowest 1% correspond to the shaded region 
shown. 


Total Cholesterol Levels in 
Women Ages 20-34 


Total cholesterol level (in mg/dL) 


A total cholesterol level in the lowest 1% is any level below the 1st percentile. 
To find the level that represents the lst percentile, you must first find the 
z-score that corresponds to a cumulative area of 0.01. In the Standard Normal 
Table, the area closest to 0.01 is 0.0099. So, the z-score that corresponds to an 
area of 0.01 is z = —2.33. To find the x-value, note that » = 179 anda = 38.9, 
and use the formula x = ww + zo, as shown. 


X= Mt ZO 
= 179 + (—2.33) (38.9) 
= 88.36 


You can check this answer using technology. For instance, you can use Excel 
to find the x-value, as shown below. 


A B Cc 
1_|NORM.INV(O.01,179,38.9) 


2 88.5050677 


Interpretation The value that separates the lowest 1% of total cholesterol 
levels for women in the 20-34 age group from the highest 99% is about 
88 milligrams per deciliter. 


TRY IT YOURSELF 5 


The lengths of time employees have worked at a corporation are normally 
distributed, with a mean of 11.2 years and a standard deviation of 2.1 years. 
In a company cutback, the lowest 10% in seniority are laid off. What is the 
maximum length of time an employee could have worked and still be laid off? 

Answer: Page A35 


SECTION 5.3 Normal Distributions: Finding Values 279 


5.3 EXERCISES rete rosa Ss | 


Building Basic Skills and Vocabulary 


Finding a z-Score In Exercises 1-16, use the Standard Normal Table or 
technology to find the z-score that corresponds to the cumulative area or percentile. 


1. 0.8365 2. 0.7357 3. 0.063 4. 0.3409 
5. 0.0284 6. 0.81 7. 0.8859 8. 0.0156 
9. Py 10. Py; 11. Py; 12. Pr 
13. Py 14. Py 15. Py; 16. Ps; 


Graphical Analysis Jn Exercises 17-22, find the indicated z-score(s) shown 


in the graph. 
17. 18. 
Area = 
0.3520 
z 
Z= v 0 
19. 20 
Area = 
0.0233 
t 1 z 
0 z=? 


2 0 z=? a=? 0 g=1 


Finding a z-Score Given an Area _ In Exercises 23-30, find the indicated 
z-score. 


23. Find the z-score that has 15.6% of the distribution’s area to its right. 
24. Find the z-score that has 88.9% of the distribution’s area to its right. 
25. Find the z-score that has 43.5% of the distribution’s area to its left. 
26. Find the z-score that has 31.5% of the distribution’s area to its left. 
27. Find the z-score that has 4.887% of the distribution’s area to its right. 
28. Find the z-score that has 93.1865% of the distribution’s area to its left. 


29. Find the positive z-score for which 70% of the distribution’s area lies 
between —z and z. 


30. Find the positive z-score for which 15% of the distribution’s area lies 
between —z and z. 


280 CHAPTER 5. Normal Probability Distributions 


Using and Interpreting Concepts 


Undergraduate 
Grade Point Average 


3 3.5 4 


Grade point average 


FIGURE FOR EXERCISE 35 


Finding Specified Data Values In Exercises 31-38, answer the questions 
about the specified normal distribution. 


31. 


32. 


33. 


34. 


35. 


Incomes of CEOs _ In a survey of South African CEOs, the mean annual 
income was $7.14 million with a standard deviation of $0.5 million. What 
income represents the 90th percentile? (Adapted from Fin24) 


(a) What income represents the 32™ percentile? 

(b) What income represents the third quartile? 

World Happiness In a recent study on world happiness, participants 
were asked to evaluate their current lives on a scale from 0 to 10, where 
0 represents the worst possible life and 10 represents the best possible life. 


The mean response was 5.4 with a standard deviation of 2.2. (Source: The 
Earth Institute, Columbia University) 


(a) What response represents the 88th percentile? 

(b) What response represents the 61st percentile? 

(c) What response represents the first quartile? 

Energy Consumption The per capita energy consumption level (in 
kilowatt-hours) in Venezuela for a recent year can be approximated by a 


normal distribution, as shown in the figure. (Source: Latin American Journal of 
Economics) 


(a) What consumption level represents the 5th percentile? 
(b) What consumption level represents the 17th percentile? 
(c) What consumption level represents the third quartile? 


Per Capita Energy Water Footprint 
Consumption Level in Venezuela in the US. 


b= 2277 kWh 
o = 584.2 kWh 


LU = 1.64 Mgal 
o = 2.84 Mgal 


1000 2000 3000. : So ae 
Kilowatt-hours Mega gallons 
FIGURE FOR EXERCISE 33 FIGURE FOR EXERCISE 34 


Water Footprint A water footprint is a measure of the appropriation of 
fresh water. The per capita water footprint (in mega gallons) in the U.S. for 
a recent year can be approximated by a normal distribution, as shown in the 
figure. (Source: Water Resources Research) 

(a) What water footprint represents the 80th percentile? 

(b) What water footprint represents the 29th percentile? 

(c) What water footprint represents the third quartile? 


Undergraduate Grade Point Average The undergraduate grade point 
averages (UGPA) of students taking the Law School Admission Test in a 
recent year can be approximated by a normal distribution, as shown in the 
figure. (Source: Law School Admission Council) 


(a) What is the minimum UGPA that would still place a student in the top 
5% of UGPAs? 


(b) Between what two values does the middle 50% of the UGPAs lie? 


GRE Analytical 
Writing Scores 


p=3.5 


Score 


FIGURE FOR EXERCISE 36 


Final Exam Grades 


10% J 


DC BA 
Points scored on final exam 


FIGURE FOR EXERCISE 42 


36. 


37. 


38. 


39. 


SECTION 5.3 Normal Distributions: Finding Values 281 


GRE Scores The test scores for the analytical writing section of the 
Graduate Record Examination (GRE) can be approximated by a normal 
distribution, as shown in the figure. (Source: Educational Testing Service) 


(a) What is the maximum score that can be in the bottom 10% of scores? 


(b) Between what two values does the middle 80% of the scores lie? 


Red Blood Cell Count = The red blood cell counts (in grams per deciliter) for 
a population of adult females can be approximated by a normal distribution, 
with a mean of 13.5 grams per deciliter and a standard deviation of 0.5 grams 
per deciliter. 


(a) What is the minimum red blood cell count that can be in the top 15% of 
counts? 


(b) What is the maximum red blood cell count that can be in the bottom 
25% of counts? 


Tire Life span The life span (in kilometers) for a population of a new brand 
of tires can be approximated by a normal distribution, with a mean of 80,000 
kilometers and a standard deviation of 1,500 kilometers. 


(a) What is the minimum life span that can be in the top 20% of life spans? 

(b) What is the maximum life span that can be in the bottom 10% of life 
spans? 

Bags of Baby Corns The weights of bags of baby corns are normally 

distributed, with a mean of 500 grams and a standard deviation of 25 grams. 


Bags in the upper 2% are too heavy and must be repackaged. What is the 
most a bag of baby corns can weigh and not need to be repackaged? 


. Writing a Guarantee You sell a brand of thermostat that has a life 


expectancy that is normally distributed, with a mean life of 8.5 years and 
a standard deviation of 0.75 years. You want to give a guarantee for free 
replacement of thermostats that do not work well. You are willing to 
replace approximately 15% of the thermostats. How should you word your 
guarantee? 


Extending Concepts 


41. 


42. 


Vending Machine A vending machine dispenses coffee into an eight-ounce 
cup. The amounts of coffee dispensed into the cup are normally distributed, 
with a standard deviation of 0.03 ounce. You can allow the cup to overflow 
1% of the time. What amount should you set as the mean amount of coffee 
to be dispensed? 


History Grades _ In a large section of a history class, the points for the final 
exam are normally distributed, with a mean of 72 and a standard deviation 
of 9. Grades are assigned according to the rule below. 

e The top 10% receive an A. 

e The next 20% receive a B. 

e The middle 40% receive a C. 

e The next 20% receive a D. 

e The bottom 10% receive an F. 


Find the lowest score on the final exam that would qualify a student for 
(a) an A, (b) a B, (c) aC, and (d) aD. 


Birth Weights in America 


The National Center for Health Statistics (NCHS) keeps records of many health-related 
aspects of people, including the birth weights of all babies born in the United States. 

The birth weight of a baby is related to its gestation period (the time between conception 
and birth). For a given gestation period, the birth weights can be approximated by a normal 
distribution. The means and standard deviations of the birth weights for various gestation 
periods are shown in the table below. 

One of the many goals of the NCHS is to reduce the percentage of babies born with low 
birth weights. The figure below shows the percents of preterm births and low birth weights 
from 2007 to 2015. 


Gestation Mean birth Standard a =a eee 
. . OO reterm = under weeks 
Reged nee devntion ul Low birth weight = under 5.5 ae 
Under 28 weeks 1.60 Ib 0.76 Ib 
28 to 31 weeks 3.201b 1.02 Ib - 
32 to 33 weeks 4311b 0.97 Ib z 9 Percent of preterm births 
34 to 36 weeks 5.74 Ib 1.13 Ib 2 gl ee 
37 to 38 weeks 6.92 Ib 1.07 Ib Percent of low birth weights 
7+ : 
39 to 40 weeks 7.60 Ib 0.99 Ib <A, } n } n fg 
41 weeks 8.00 Ib 0.99 Ib 2007 2008 2009 2010 2011 2012 2013 2014 2015 
42 weeks and over 8.16 lb 1.12 Ib Year 
EXERCISES 
1. The distributions of birth weights for three 2. What percent of the babies born within each 
gestation periods are shown. Match each curve gestation period have a low birth weight (under 
with a gestation period. Explain your reasoning. 5.5 pounds)? 
(a) 28 to 31 weeks (b) 32 to 33 weeks 
(c) 39 to 40 weeks (d) 42 weeks and over 
3. Describe the weights of the top 10% of the babies 
born within each gestation period. 
(a) Under 28 weeks (b) 34 to 36 weeks 
(b) (c) 41 weeks (d) 42 weeks and over 
4. For each gestation period, what is the probability 
that a baby will weigh between 6 and 9 pounds 
at birth? 
(a) Under 28 weeks (b) 32 to 33 weeks 
(c) 37 to 38 weeks (d) 41 weeks 
5. A birth weight of less than 3.25 pounds is classified 
by the NCHS as a “very low birth weight.” What 
is the probability that a baby has a very low birth 
weight for each gestation period? 
(a) Under 28 weeks (b) 28 to 31 weeks 
Hounds (c) 32 to 33 weeks (d) 39 to 40 weeks 


282 CHAPTER 5. Normal Probability Distributions 


What You Should Learn 


» How to find sampling 
distributions and verify their 
properties 

» How to interpret the Central 
Limit Theorem 


» How to apply the Central Limit 
Theorem to find the probability 
of a sample mean 


Study Tip 


Sample means can vary 
from one another and 
can also vary from the 
population mean. This 
type of variation is to be 
= expected and is called 
sampling error. You will learn more 
about this topic in Section 6.1. 


5d Sampling Distributions and the Central Limit Theorem 


SECTION 5.4 Sampling Distributions and the Central Limit Theorem 283 


Sampling Distributions = The Central LimitTheorem m Probability and 
the Central Limit Theorem 


Sampling Distributions 


In previous sections, you studied the relationship between the mean of a 
population and values of a random variable. In this section, you will study the 
relationship between a population mean and the means of random samples taken 
from the population. 


DEFINITION 


A sampling distribution is the probability distribution of a sample statistic 
that is formed when random samples of size n are repeatedly taken from a 


population. If the sample statistic is the sample mean, then the distribution 
is the sampling distribution of sample means. Every sample statistic has a 
sampling distribution. 


Consider the Venn diagram below. The rectangle represents a large 
population, and each circle represents a random sample of size n. Because the 
sample entries can differ, the sample means can also differ. The mean of Random 
Sample 1 is x;; the mean of Random Sample 2 is x7; and so on. The sampling 
distribution of the sample means for samples of size n for this population consists 
of x1, X2, X3, and so on. If the samples are drawn with replacement, then an 
infinite number of samples can be drawn from the population. 


Population with Mean y and Standard Deviation o 


Random 
Sample 3, 


Random 
Sample 1, 
% 


: Random 


Sample 4, 


Random 
Sample 2, 
x 


2 4 


Properties of Sampling Distributions of Sample Means 
. The mean of the sample means yy; is equal to the population mean p. 
My — Mo 


. The standard deviation of the sample means ox is equal to the population 


standard deviation o divided by the square root of the sample size n. 


The standard deviation of the sampling distribution of the sample means 
is called the standard error of the mean. 


284 CHAPTER 5. Normal Probability Distributions 


Probability Histogram 
of Population of x 
P(x) 
A 


0.25 +- 
0.20 + 
0.15 -- 
0.10 -- 
0.05 -- 
> x 
123 4567 


Number of grocery shopping trips 


Probability 


Probability Distribution 
of Sample Means 


se || Jf Probability 
1|1 1/16 
2|2 2/16 
3°) 3 3/16 
4/4 4/16 
5 | 3 3/16 
6 | 2 2/16 
7/1 1/16 
Probability Histogram of 
Sampling Distribution of x 
P(x) 
A 
0.25 4 
> 0.204 
2 0.15 4 
© 0.10- 
Ay 
0.05 4 


1 2 3 4 5 6 7 
Sample mean 


To explore this topic further, 
see Activity 5.4 on page 296. 


Study Tip 
Review Section 4.1 

to find the mean and 
standard deviation of a probability 
distribution. 


A Sampling Distribution of Sample Means 


The number of times four people go grocery shopping in a month is given 
by the population values {1,3,5,7}. A probability histogram for the data 
is shown at the left. You randomly choose two of the four people, with 
replacement. List all possible samples of size n = 2 and calculate the mean of 
each. These means form the sampling distribution of the sample means. Find 
the mean, variance, and standard deviation of the sample means. Compare 
your results with the mean pw = 4, variance o” = 5, and standard deviation 
o = V5 ~ 2.2 of the population. 


SOLUTION 
List all 16 samples of size 2 from the population and the mean of each sample. 


Sample Sample mean, x Sample Sample mean, x 
1,1 1 5,1 3 
1,3 2 5,3 4 
1,5 3 5,5 5 
LF 4 53.7 6 
3,1 2 7,1 4 
3,3 3 7,3 5 
3,5 4 155 6 
34:7 5 7,7 7 


After constructing a probability distribution of the sample means, you can 
graph the sampling distribution using a probability histogram as shown at 
the left. Notice that the shape of the histogram is bell-shaped and symmetric, 
similar to a normal curve. The mean, variance, and standard deviation of the 
16 sample means are 


we =4 Mean of the sample means 


Variance of the sample means 


and 


Standard deviation of the sample means 


Oo = iz = V25 ~ 16. 


These results satisfy the properties of sampling distributions because 
Mg = b= 4 
and 
age 
Va Vi 
TRY IT YOURSELF 1 


List all possible samples of size n = 3, with replacement, from the population 
{1,3,5}. Calculate the mean of each sample. Find the mean, variance, and 
standard deviation of the sample means. Compare your results with the mean 
mw = 3, variance o” = 8/3, and standard deviation o = V8/3 ~ 1.6 of the 
population. 


1.6. 


OE 


Answer: Page A35 


The distribution of sample 
means has the same 
mean as the population. 
But its standard deviation 
is less than the standard 

» deviation of the population. 
This tells you that the distribution of 
sample means has the same center 
as the population, but it is not as 
spread out. 


Moreover, the distribution of sample 
means becomes less and less spread 
out (tighter concentration about the 

mean) as the sample size rn increases. 


SECTION 5.4 Sampling Distributions and the Central Limit Theorem 285 


The Central Limit Theorem 


The Central Limit Theorem forms the foundation for the inferential branch 
of statistics. This theorem describes the relationship between the sampling 
distribution of sample means and the population that the samples are taken 
from. The Central Limit Theorem is an important tool that provides the 
information you will need to use sample statistics to make inferences about a 
population mean. 


The Central Limit Theorem 


1. If random samples of size n, where n = 30, are drawn from any population 

with a mean p» and a standard deviation a, then the sampling distribution of 
sample means approximates a normal distribution. The greater the sample 
size, the better the approximation. (See figures for “Any Population 
Distribution” below.) 
If random samples of size n are drawn from a population that is normally 
distributed, then the sampling distribution of sample means is normally 
distributed for any sample size n. (See figures for “Normal Population 
Distribution” below.) 


In either case, the sampling distribution of sample means has a mean equal 
to the population mean. 


Py = Bb Mean of the sample means 


The sampling distribution of sample means has a variance equal to 1 /n times 
the variance of the population and a standard deviation equal to the 
population standard deviation divided by the square root of n. 


Variance of the sample means 


Standard deviation of the sample means 


Recall that the standard deviation of the sampling distribution of the sample 
means, o;, is also called the standard error of the mean. 


1. Any Population Distribution 2. Normal Population Distribution 


Standard °— Standard 
eae a 
°——~ deviation deviation 


H— Mean HU— Mean 


Distribution of Sample Means, Distribution of Sample Means 
n = 30 (any 7) 


Standard Standard 
deviation deviation 
of the of the 
sample sample 


means means 


&| 
=I 


H-= H—_ Dean H;= H ——~ Mean 


286 


CHAPTER 5 Normal Probability Distributions 


Interpreting the Central Limit Theorem 


A study analyzed the sleep habits of college students. The study found that the 
mean sleep time was 6.8 hours, with a standard deviation of 1.4 hours. Random 
samples of 100 sleep times are drawn from this population, and the mean 
of each sample is determined. Find the mean and standard deviation of the 
sampling distribution of sample means. Then sketch a graph of the sampling 
distribution. (Adapted from The Journal of American College Health) 


Distribution for All Sleep Times 


2 3 4 2 6 7 8 9 10 11 


Individual sleep times (in hours) 


SOLUTION 


The mean of the sampling distribution is equal to the population mean, and 
the standard deviation of the sample means is equal to the population standard 
deviation divided by V/n. So, 


by = bw = 68 Mean of the sample means 


and 


Standard deviation of the sample means 


Interpretation From the Central Limit Theorem, because the sample size is 
greater than 30, the sampling distribution can be approximated by a normal 
distribution with a mean of 6.8 hours and a standard deviation of 0.14 hour, as 
shown in the figure. 


Distribution of Sample Means with n = 100 


2 3 4 3 6 7 8 9 10 11 
Mean of 100 sleep times (in hours) 


TRY IT YOURSELF 2 


Random samples of size 64 are drawn from the population in Example 2. 
Find the mean and standard deviation of the sampling distribution of sample 
means. Then sketch a graph of the sampling distribution and compare it with 
the sampling distribution in Example 2. 

Answer: Page A35 


(Zoey 
Mees ‘ : 
v=) Picturing 


the World 


In a recent year, there were 

about 4.2 million parents in the 
United States who received child 
support payments. The histogram 
shows the distribution of children 
per custodial parent. The mean 
number of children was 1.7 and 
the standard deviation was 0.8. 
(Adapted from U.S. Census Bureau) 


66) Child Support 
A 


0.5 -- 


0.4++ 


0.35 


0.25 


Probability 


0.1 -- 


12.3 4 5 6 7 


Number of children 


You randomly select 35 parents 
who receive child support and 

ask how many children in their 
custody are receiving child support 
payments. What is the probability 
that the mean of the sample is 
between 1.5 and 1.9 children? 


SECTION 5.4 Sampling Distributions and the Central Limit Theorem 287 


Interpreting the Central Limit Theorem 


Assume the training heart rates of all 20-year-old athletes are normally 
distributed, with a mean of 135 beats per minute and a standard deviation 
of 18 beats per minute, as shown in the figure. Random samples of size 4 are 
drawn from this population, and the mean of each sample is determined. Find 
the mean and standard deviation of the sampling distribution of sample means. 
Then sketch a graph of the sampling distribution. 


Distribution of Population Training Heart Rates 


85 110 135 160 


Rate (in beats per minute) 


SOLUTION 
z = pb = 135 beats per minute Mean of the sample means 
and 
= 
Ox va 
18 


<h 


= 9 beats per minute Standard deviation of the sample means 


Interpretation From the Central Limit Theorem, because the population is 
normally distributed, the sampling distribution of the sample means is also 
normally distributed, as shown in the figure. 


Distribution of Sample Means with n = 4 


BI 


85 110 135 160 185 


Mean rate (in beats per minute) 


TRY IT YOURSELF 3 


The diameters of fully grown white oak trees are normally distributed, with a mean 
of 3.5 feet and a standard deviation of 0.2 foot, as shown in the figure. Random 
samples of size 16 are drawn from this population, and the mean of each sample is 
determined. Find the mean and standard deviation of the sampling distribution of 
sample means. Then sketch a graph of the sampling distribution. 


Distribution of Population Diameters 


2.9 3.1 33) 3:5 3.7 3.9 4.1 
Diameter (in feet) 


Answer: Page A35 


288 CHAPTER 5. Normal Probability Distributions 


Distribution of 


taal 


18 19 \ 20 21 
19.4 


Mean distance (in miles) 


z-Score 
Distribution 
of Sample 
Means with 
n=50 


-1.41 0 1.96 


Probability and the Central Limit Theorem 


In Section 5.2, you learned how to find the probability that a random variable x 
will lie in a given interval of population values. In a similar manner, you can find 
the probability that a sample mean X will lie in a given interval of the x sampling 
distribution. To transform x to a z-score, you can use the formula 

_ Value - Mean X¥—-—py ¥-p 


Standard error Ox a / Vn 


Finding Probabilities for Sampling Distributions 


The figure at the right shows 
the mean distances traveled 
by drivers each day. You 
randomly select 50 drivers 
ages 16 to 19. What is the | 16-19 
probability that the mean 
distance traveled each day is 
between 19.4 and 22.5 miles? | 7°”? 
Assume a = 6.5 miles. 


Miles to go 


The average miles driven each day, by age group: 


SOLUTION we 


The sample size is greater than 
30, so you can use the Central | 50-64 
Limit Theorem to conclude 
that the distribution of sample 


: : 65-74 Source: American 
means is approximately Automobile 
normal, with a mean and a Assocation 
standard deviation of 

: Co 6.5 ; 
bz = w= 20.7 miles and of = —= = —— ~ 0.9 mile. 


The graph of this distribution is shown at the left with a shaded area between 
19.4 and 22.5 miles. The z-scores that correspond to sample means of 19.4 and 
22.5 miles are found as shown. 


_ 19.4 — 20.7 


Zz = — ~ -141 Convert 19.4 to z-score 
6.5/V50 
22.5 — 20. 

= 2 uu = 1.96 Convert 22.5 to z-score 


z= 
65/50 
So, the probability that the mean distance driven each day by the sample of 
50 people is between 19.4 and 22.5 miles is 
P(19.4 <x < 22.5) = P(-141 < z < 1.96) 
= P(z < 1.96) — P(z < -141) 
= 0.9750 — 0.0793 
= 0.8957. 
Interpretation Of allsamples of 50 drivers ages 16 to 19, about 90% will drive 
a mean distance each day between 19.4 and 22.5 miles, as shown in the graph 


at the left. This implies that, assuming the value of 4 = 20.7 is correct, about 
10% of such sample means will lie outside the given interval. 


Study Tip 


Before you find 


probabilities for intervals 
of the sample mean x, 
use the Central Limit 
Theorem to determine the 
<< = mean and the standard 
deviation of the sampling distribution 
of the sample means. That is, 
calculate wy and oy. 


Distribution 
of Sample 

Means with 
n=9 


\ p= 10,453 


8500 9500 10,500 11,500 = 12,500 


Mean room and board (in dollars) 


#| 


SECTION 5.4 Sampling Distributions and the Central Limit Theorem 289 


TRY IT YOURSELF 4 


You randomly select 100 drivers ages 16 to 19 from Example 4. What is the 
probability that the mean distance traveled each day is between 19.4 and 
22.5 miles? Use = 20.7 miles and a0 = 6.5 miles. 


Answer: Page A35 


Finding Probabilities for Sampling Distributions 


The mean room and board expense per year at four-year colleges is $10,453. 
You randomly select 9 four-year colleges. What is the probability that the 
mean room and board is less than $10,750? Assume that the room and board 
expenses are normally distributed with a standard deviation of $1650. (Adapted 
from National Center for Education Statistics) 


SOLUTION 


Because the population is normally distributed, you can use the Central 
Limit Theorem to conclude that the distribution of sample means is normally 
distributed, with a mean and a standard deviation of 


a _ $1650 


Vn Vo 
The graph of this distribution is shown at the left. The area to the left of 
$10,750 is shaded. The z-score that corresponds to $10,750 is 

_ 10,750 — 10,453 297 


i = = 0.54. 
1650/V9 550 


So, the probability that the mean room and board expense is less than $10,750 is 
P(x < 10,750) = P(z < 0.54) 
= 0.7054. 


by = pw = $10,453 and of = = $550. 


You can check this answer using technology. For instance, you can use a 
TI-84 Plus to find the x-value, as shown below. 


TI-84 PLUS 


normalod? ¢ - Lee 
218758, 16453, 558 


»PAS4dH15111 


Interpretation So, about 71% of such samples with n = 9 will have a mean 
less than $10,750 and about 29% of these sample means will be greater than 
$10,750. 


TRY IT YOURSELF 5 


The average sales price of a single-family house in the United States is 
$235,500. You randomly select 12 single-family houses. What is the probability 
that the mean sales price is more than $225,000? Assume that the sales prices 
are normally distributed with a standard deviation of $50,000. (Adapied from 
National Association of Realtors) 


Answer: Page A35 


290 CHAPTER 5. Normal Probability Distributions 


“4. Study Tip 
To find probabilities for 
individual members of a 
population with a normally 
distributed random variable 
x, use the formula 

XT pb 

a 


To find probabilities for the mean X of 
a sample of size n, use the formula 


X — bx 
z= : 


OX 


c | 
4_|NORM.DIST(1400, 1615,550, TRUE) 
2 0.347932217 


Cc 
4_|NORM.DIST(1400, 1615, 110, TRUE) 
0.025318372 


The Central Limit Theorem can also be used to investigate unusual events. 


An unusual event is one that occurs with a probability of less than 5%. 


Finding Probabilities for x and x 


Some college students use credit cards to pay for school-related expenses. For 
this population, the amount paid is normally distributed, with a mean of $1615 
and a standard deviation of $550. (Adapted from Sallie Mae/Ipsos Public Affairs) 


1. What is the probability that a randomly selected college student, who uses a 
credit card to pay for school-related expenses, paid less than $1400? 


2. You randomly select 25 college students who use credit cards to pay for 
school-related expenses. What is the probability that their mean amount 
paid is less than $1400? 


3. Compare the probabilities from parts 1 and 2. 
SOLUTION 


1. In this case, you are asked to find the probability associated with a certain 
value of the random variable x. The z-score that corresponds to x = $1400 is 


x—p 1400-1615 — —215 
ao 550550 
So, the probability that the student paid less than $1400 is 


P(x < 1400) = P(z < —0.39) = 0.3483. 


= —0.39. 


You can check this answer using technology. For instance, you can use 
Excel to find the probability, as shown at the left. (The answer differs 
slightly due to rounding.) 


2. Here, you are asked to find the probability associated with a sample mean 
x. The z-score that corresponds to x = $1400 is 


¥—py X¥-pm 1400-1615 -215 | 
Ox a/Vn 550/25 110 


So, the probability that the mean credit card balance of the 25 card holders 
is less than $1400 is 


P(x < 1400) = P(z < -1.95) = 0.0256. 


z= 1.95. 


You can check this answer using technology. For instance, you can use 
Excel to find the probability, as shown at the left. (The answer differs 
slightly due to rounding.) 


3. Interpretation Although there is about a 35% chance that a college student 
who uses a credit card to pay for school-related expenses will pay less than 
$1400, there is only about a 3% chance that the mean amount a sample of 
25 college students will pay is less than $1400. Because there is only a 3% 
chance that the mean amount a sample of 25 college students will pay is less 
than $1400, this is an unusual event. 


TRY IT YOURSELF 6 


A consumer price analyst claims that prices for liquid crystal display (LCD) 
computer monitors are normally distributed, with a mean of $190 and a 
standard deviation of $48. What is the probability that a randomly selected 
LCD computer monitor costs less than $200? You randomly select 10 LCD 
computer monitors. What is the probability that their mean cost is less than 
$200? Compare these two probabilities. Answer: Page A35 


SECTION 5.4 Sampling Distributions and the Central Limit Theorem 291 


BA EXERCISES 


For Extra Help: MyLab Statistics 


Building Basic Skills and Vocabulary 


In Exercises 1-4, a population has a mean wp and a standard deviation o. Find the 
mean and standard deviation of the sampling distribution of sample means with 
sample size n. 


1. pw = 225,0 = 40,n = 75 
2. w = 99,0 = 12,n = 225 
3. w = 1022,0 = 144,n = 360 
4. w = 4848, 0 = 24,n = 1200 


True or False? Jn Exercises 5-8, determine whether the statement is true or 
false. If it is false, rewrite it as a true statement. 


5. As the sample size increases, the mean of the distribution of sample means 
increases. 


6. As the sample size increases, the standard deviation of the distribution of 
sample means increases. 


7. A sampling distribution is normal only when the population is normal. 


8. If the sample size is at least 30, then you can use z-scores to determine 
the probability that a sample mean falls in a given interval of the sampling 
distribution. 


Graphical Analysis Jn Exercises 9 and 10, the graph of a population 
distribution is shown with its mean and standard deviation. Random samples of 
size 100 are drawn from the population. Determine which of the figures labeled 
(a)—(c) would most closely resemble the sampling distribution of sample means. 
Explain your reasoning. 


9. The waiting time (in seconds) to turn left at an intersection 


Relative frequency 


a a een ae ea 
10 20 30 40 50 


Time (in seconds) 


-10 0 10 20 30 40 10 20 30 40 


Time (in seconds) Time (in seconds) Time (in seconds) 


292 


CHAPTER 5. Normal Probability Distributions 


10. 


The annual snowfall (in feet) for a central New York state county 


P(x) 
A 


M=5.8 
5 o =2.3 
& 012+ 
=} 
3 
S& 0.08 + 
o 
i 
% 0.044 
oO 
[a2 
> xX 
2 4 6 8 10 
Snowfall (in feet) 
(a) =5.8 
=2.3 


el 


x x r 
—2 0 2 4 6 8 1012 
=05: 0 05° 1.0 15 2 4 6 8 10 


Snowfall (in feet) Snowfall (in feet) Snovwiall ia teat) 


A Sampling Distribution of Sample Means Jn Exercises 11-14, a 
population and sample size are given. 


(a) Find the mean and standard deviation of the population. 


(b) List all samples (with replacement) of the given size from the population and 


find the mean of each. 


(c) Find the mean and standard deviation of the sampling distribution of sample 


11. 


12. 


13. 


14. 


means and compare them with the mean and standard deviation of the 
population. 


The load-bearing capacities (in thousands of pounds) of five transmission 
line insulators are 64, 48, 19, 79, and 56. Use a sample size of 2. 


The diameters (in inches) of four machine parts are 1.000, 1.004, 1.001, and 
1.003. Use a sample size of 2. 


The melting points (in degrees Celsius) of three industrial lubricants are 350, 
399, and 418. Use a sample size of 3. 


The lifetimes (in hours) of four diamond-tipped cutting tools are 70, 85, 81, 
and 67. Use a sample size of 3. 


Finding Probabilities §/n Exercises 15-18, the population mean and standard 
deviation are given. Find the indicated probability and determine whether the 
given sample mean would be considered unusual. 


15. 


16. 


17. 


18. 


For a random sample of m = 49, find the probability of a sample mean being 
less than 37.2 when pw = 38 and o = 2.17. 


For a random sample of n = 225, find the probability of a sample mean 
being greater than 37.2 when w = 38 and o = 2.17. 


For a random sample of n = 60, find the probability of a sample mean being 
greater than 132 when w = 130 ando = 16.1. 


For a random sample of n = 25, find the probability of a sample mean being 
less than 100 or greater than 102 when w = 100 and o = 4.5. 


SECTION 5.4 Sampling Distributions and the Central Limit Theorem 293 


Using and Interpreting Concepts 


Interpreting the Central Limit Theorem Jn Exercises 19-26, find the 
mean and standard deviation of the indicated sampling distribution of sample 
means. Then sketch a graph of the sampling distribution. 


19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


27. 


28. 


SAT Critical Reading Scores: Males The scores for males on the critical 
reading portion of the SAT in 2016 are normally distributed, with a mean of 
495 and a standard deviation of 120. Random samples of size 20 are drawn 
from this population, and the mean of each sample is determined. (Source: 
The College Board) 


SAT Critical Reading Scores: Females The scores for females on the critical 
reading portion of the SAT in 2016 are normally distributed, with a mean of 
493 and a standard deviation of 114. Random samples of size 36 are drawn 
from this population, and the mean of each sample is determined. (Source: 
The College Board) 


Temperature The monthly growing season temperatures across villages 
in Tanzania are normally distributed, with a mean of 23°C and a standard 
deviation of 1.3°C. Random samples of size 25 are drawn from this 
population, and the mean of each sample is determined. (Source: Agricultural 
and Applied Economics Association) 


Precipitation The monthly growing season precipitation across villages 
in Tanzania is normally distributed, with a mean of 87 centimeters and a 
standard deviation of 14.5 centimeters. Random samples of size 30 are drawn 
from this population, and the mean of each sample is determined. (Source: 
Agricultural and Applied Economics Association) 


Water Footprint A water footprint is a measure of the appropriation of 
fresh water. The per capita water footprint in the United States for a recent 
year is approximately normally distributed, with a mean of 1.64 mega 
gallons and a standard deviation of 2.89 mega gallons. Random samples 
of size 12 are drawn from this population, and the mean of each sample is 
determined. (Source: Water Resources Research) 


Water Use in Hospitals The amounts of cold water for patient consumption 
in hospitals in Spain are normally distributed, with a mean of 196 cubic 
meters per bed and a standard deviation of 70 cubic meters per bed. Random 
samples of size 15 are drawn from this population, and the mean of each 
sample is determined. (Source: Journal of Healthcare Engineering) 


Salaries The annual salary for senior-level chemical engineers is normally 
distributed, with a mean of about $132,000 and a standard deviation of about 
$18,000. Random samples of 35 are drawn from this population, and the 
mean of each sample is determined. (Adapted from Salary.com) 


Salaries The annual salary for clinical pharmacists is normally distributed, 
with a mean of about $111,000 and a standard deviation of about $13,000. 
Random samples of 48 are drawn from this population, and the mean of each 
sample is determined. (Adapted from Salary.com) 


Repeat Exercise 19 for samples of size 15 and 10. What happens to the mean 
and the standard deviation of the distribution of sample means as the sample 
size decreases? 


Repeat Exercise 20 for samples of size 18 and 12. What happens to the mean 
and the standard deviation of the distribution of sample means as the sample 
size decreases? 


294 


CHAPTER 5 Normal Probability Distributions 


Finding Probabilities for Sampling Distributions Jn Exercises 29-32, 
find the indicated probability and interpret the results. 


29. 


30. 


31. 


32. 


33. 


34. 


35. 


36. 


37. 


38. 


Dow Jones Industrial Average From 1975 through 2016, the mean gain of 
the Dow Jones Industrial Average was 456. A random sample of 32 years is 
selected from this population. What is the probability that the mean gain for 
the sample was between 200 and 500? Assume o0 = 1215. 


Standard & Poor’s 500 From 1871 through 2016, the mean return of the 
Standard & Poor’s 500 was 10.72%. A random sample of 38 years is selected 
from this population. What is the probability that the mean return for the 
sample was between 9.1% and 10.3%? Assume 0 = 18.60%. 


Childhood Asthma Prevalence The mean percent of childhood asthma 
prevalence of 43 Chinese cities is 2.25%. A random sample of 30 Chinese 
cities is selected. What is the probability that the mean childhood asthma 
prevalence for the sample is greater than 2.6%? Assume ao = 1.30%. (Source: 
BioMed Research International) 


Carbon Dioxide Emissions The mean per capita carbon dioxide emissions 
in 58 industrialized countries over a 22-year period is 25.5 metric tons. A 
random sample of 44 countries is selected. What is the probability that the 
mean carbon dioxide emissions for the sample is greater than 28 metric tons? 
Assume a0 = 395.1 metric tons. (Source: Energy Reports) 


Which Is More Likely? Assume that the childhood asthma prevalences in 
Exercise 31 are normally distributed. Are you more likely to randomly select 
1 city with childhood asthma prevalence less than 3.2% or to randomly select 
a sample of 10 cities with a mean childhood asthma prevalence less than 
3.2%? Explain. 


Which Is More Likely? Assume that the carbon dioxide emissions in 
Exercise 32 are normally distributed. Are you more likely to randomly 
select 1 country with carbon dioxide emissions less than 30 metric tons 
or to randomly select a sample of 15 countries with mean carbon dioxide 
emissions less than 30 metric tons? Explain. 


Paint Cans A machine is set to fill paint cans with a mean of 5 kilograms 
and a standard deviation of 0.02 kilograms. A random sample of 75 cans has 
a mean of 4.99 kilograms. The machine needs to be reset when the mean of 
a random sample is unusual. Does the machine need to be reset? Explain. 


Chocolate Cookies A machine is set to pack chocolate cookies with a mean of 
125 grams and a standard deviation of 4 grams. A random sample of 16 packs 
has a mean of 125.5 grams. The machine needs to be reset when the mean of a 
random sample is unusual. Does the machine need to be reset? Explain. 


Cloth Cutter The lengths of cloth a machine cuts for making dresses are 
normally distributed, with a mean of 3 meters and a standard deviation of 
0.25 meters. 


(a) What is the probability that a randomly selected cloth cut by the machine 
has a length greater than 3.2 meters? 

(b) You randomly select 60 cloth pieces. What is the probability that their 
mean length is greater than 3.2 meters? 

Muffins The weights of muffin cartons are normally distributed with a 

mean weight of 1.25 kilograms and a standard deviation of 0.1 kilograms. 

(a) What is the probability that a randomly selected carton has a weight 
greater than 1.3 kilograms? 


(b) Yourandomly select 100 cartons. What is the probability that their mean 
weight is greater than 1.3 kilograms? 


SECTION 5.4 Sampling Distributions and the Central Limit Theorem 295 


Extending Concepts 


Finite Correction Factor The formula for the standard deviation of the 
sampling distribution of sample means 


given in the Central Limit Theorem is based on an assumption that the population 
has infinitely many members. This is the case whenever sampling is done with 
replacement (each member is put back after it is selected), because the sampling 
process could be continued indefinitely. The formula is also valid when the sample 
size is small in comparison with the population. When sampling is done without 
replacement and the sample size n is more than 5% of the finite population of size 
N (n/N > 0.05), however, there is a finite number of possible samples. A finite 
correction factor, 


N-n 
N-1 
should be used to adjust the standard deviation. The sampling distribution of the 


sample means will be normal with a mean equal to the population mean, and the 
standard deviation will be 


eae ead 
VnaVN-1 


In Exercises 39 and 40, determine whether the finite correction factor should be 
used. If so, use it in your calculations when you find the probability. 


OE 


39. Parking Infractions In a sample of 1000 fines issued by the city of Toronto for 
parking infractions, the mean fine was $47.12 and the standard deviation was 
$48.24. A random sample of size 55 is selected from this population. What is the 
probability that the mean fine is less than $50? (Adapted from City of Toronto) 


40. Old Faithful In a sample of 100 eruptions of the Old Faithful geyser 
at Yellowstone National Park, the mean interval between eruptions was 
101.56 minutes and the standard deviation was 42.69 minutes. A random 
sample of size 30 is selected from this population. What is the probability 
that the mean interval between eruptions is between 95 minutes and 
110 minutes? (Adapted from Geyser Times) 


Sampling Distribution of Sample Proportions For a random sample 
of size n, the sample proportion is the number of individuals in the sample with a 
specified characteristic divided by the sample size. The sampling distribution of 
sample proportions is the distribution formed when sample proportions of size n 
are repeatedly taken from a population where the probability of an individual with 
a specified characteristic is p. The sampling distribution of sample proportions 
has a mean equal to the population proportion p and a standard deviation equal 
to Vpq/n. In Exercises 41 and 42, assume the sampling distribution of sample 
proportions is a normal distribution. 


41. Construction About 63% of the residents in a town are in favor of building 
a new high school. One hundred five residents are randomly selected. What 
is the probability that the sample proportion in favor of building a new 
school is less than 55%? Interpret your result. 


42. Conservation About 74% of the residents in a town say that they are 
making an effort to conserve water or electricity. One hundred ten residents 
are randomly selected. What is the probability that the sample proportion 
making an effort to conserve water or electricity is greater than 80%? 
Interpret your result. 


ACTIVITY 


Sampling Distributions 


APPLET 


You can find the interactive 
applet for this activity 
within MyLab Statistics or at 
www.pearsonglobaleditions 
.com. 


APPLET 


The sampling distributions applet allows you to investigate sampling distributions 
by repeatedly taking random samples from a population. The top plot displays 
the distribution of a population. Several options are available for the population 
distribution (Uniform, Bell-shaped, Skewed, Binary, and Custom). When SAMPLE 
is clicked, N random samples of size n will be repeatedly selected from the 
population. The sample statistics specified in the bottom two plots will be updated 
for each sample. When N is set to 1 and nis less than or equal to 50, the display will 
show, in an animated fashion, the points selected from the population dropping into 
the second plot and the corresponding summary statistic values dropping into the 
third and fourth plots. Click RESET to stop an animation and clear existing results. 
Summary statistics for each plot are shown in the panel at the left of the plot. 


Population (can be changed with mouse) 


Mean 25 


Uniform 
Median 25 REtCE 
Std. Dev. 14.4338 
0 50 
Sample data 
6 __Sample__| 
Mean 4 a= & 
Median 2 N=/1 & 
Std. Dev. 0 
0 50 
Sample Means 
6 
N 
M 4 
ean 
Median 2 we 
Std. Dev. 0 
0 50 
Sample Medians 
6+ 
N 
4 
Mean 
eden 2 Median 
0 


EXPLORE 


Step 1 Specify a distribution. 

Step 2 Specify values of n and N. 

Step 3 Specify what to display in the bottom two graphs. 
Step 4 Click SAMPLE to generate the sampling distributions. 


DRAW CONCLUSIONS 


1. Run the simulation using n = 30 and N = 10 for a uniform, a bell-shaped, and a 
skewed distribution. What is the mean of the sampling distribution of the sample 
means for each distribution? For each distribution, is this what you would expect? 


2. Run the simulation using n = 50 and N = 10 for a bell-shaped distribution. 
What is the standard deviation of the sampling distribution of the sample 
means? According to the formula, what should the standard deviation of the 
sampling distribution of the sample means be? Is this what you would expect? 


296 CHAPTER 5. Normal Probability Distributions 


What You Should Learn 


» How to determine when 
a normal distribution can 
approximate a binomial 
distribution 


» How to find the continuity 
correction 


~ How to use a normal 
distribution to approximate 
binomial probabilities 


Study Tip 


of binomial experiments 
(see Section 4.2). 


e n independent trials 


success or failure 


e Probability of success is p; 


probability of failure is g = 1 — p 


® pis the same for each trial 


55 Normal Approximations to Binomial Distributions 


Here are some properties 


e Two possible outcomes: 


SECTION 5.5 Normal Approximations to Binomial Distributions 297 


Approximating a Binomial Distribution m Continuity Correction 
= Approximating Binomial Probabilities 


Approximating a Binomial Distribution 


In Section 4.2, you learned how to find binomial probabilities. For instance, 
consider a surgical procedure that has an 85% chance of success. When a doctor 
performs this surgery on 10 patients, you can use the binomial formula to find 
the probability of exactly two successful surgeries. 

But what if the doctor performs the surgical procedure on 150 patients and 
you want to find the probability of fewer than 100 successful surgeries? To do this 
using the techniques described in Section 4.2, you would have to use the binomial 
formula 100 times and find the sum of the resulting probabilities. This approach 
is not practical, of course. A better approach is to use a normal distribution to 
approximate the binomial distribution. 


Normal Approximation to a Binomial Distribution 


If np = 5 and ng = S, then the binomial random variable x is approximately 
normally distributed, with mean 


b= np 
and standard deviation 
a= Vnpq 


where v is the number of independent trials, p is the probability of success in 
a single trial, and q is the probability of failure in a single trial. 


To see why a normal approximation is valid, look at the binomial distributions 
for p = 0.25, q = 1 — 0.25 = 0.75, and n = 4, n = 10, n = 25, and n = S50 
shown below. Notice that as n increases, the shape of the binomial distribution 
becomes more similar to a normal distribution. 


P(x) 
A 


0.30 -- 


0.25:4- 


0.20 -- 


0.15 - 
0.10 


0.05 


0123 45678 9 10 

P(x) P(x) 

A A 
7 = n=25 0.12 + n=50 
014 np = 6.25 o.10+ np = 12.5 
0.12 + nq = 18.75 0.08 ng = 37.5 
010+ ale 
0.08 + 0.06 + 
0106 or 0.04 - 
0.04 a 
0.02 -+ 0.02 -- 
x 


0 2 4 6 8 10 12 14 16 18 0 2 4 6 8 10 12 14 16 18 20 22 24 


298 


CHAPTER 5 Normal Probability Distributions 


4 6 8 


10 12 14 16 


Approximating a Binomial Distribution 


Two binomial experiments are listed. Determine whether you can use a normal 
distribution to approximate the distribution of x, the number of people who reply 
yes. If you can, find the mean and standard deviation. If you cannot, explain why. 


1. 


In a survey of 8- to 18-year-old heavy media users in the United States, 
47% said they get fair or poor grades (C and below). You randomly select 
forty-five 8- to 18-year-old heavy media users in the United States and ask 
them whether they get fair or poor grades. (Source: Kaiser Family Foundation) 


. In a survey of 8- to 18-year-old light media users in the United States, 


23% said they get fair or poor grades (C and below). You randomly select 
twenty 8- to 18-year-old light media users in the United States and ask them 
whether they get fair or poor grades. (Source: Kaiser Family Foundation) 


SOLUTION 


1 


2. 


In this binomial experiment, n = 45, p = 0.47, and q = 0.53. So, 
np = 45(0.47) = 21.15 
and 
nq = 45(0.53) = 23.85. 
Because np and nq are greater than 5, you can use a normal distribution with 


np = 21.15 


a 


and 
a = Vapq = V45(0.47) (0.53) = 3.35 


to approximate the distribution of x. In the figure at the left, notice that 
the binomial distribution is approximately bell-shaped, which supports 
the conclusion that you can use a normal distribution to approximate the 
distribution of x. 


In this binomial experiment, m = 20, p = 0.23, and q = 0.77. So, 
np = 20(0.23) = 4.6 

and 
nq = 20(0.77) = 15.4. 


Because np <5, you cannot use a normal distribution to approximate 
the distribution of x. In the figure at the left, notice that the binomial 
distribution is skewed right, which supports the conclusion that you cannot 
use a normal distribution to approximate the distribution of x. 


TRY IT YOURSELF 1 


A binomial experiment is listed. Determine whether you can use a normal 
distribution to approximate the distribution of x, the number of people who 
reply yes. If you can, find the mean and standard deviation. If you cannot, 
explain why. 


In a survey of adults in the United States, 29% said they have seen a 
person using a mobile device walk in front of a moving vehicle without 
looking. You randomly select 100 adults in the United States and ask 
them whether they have seen a person using a mobile device walk in 
front of a moving vehicle without looking. (Source: Consumer Reports) 
Answer: Page A36 


Exact binomial 


probability 
P(x =c) 
a —S+> x 
Cc 
P(c-0.5<x<c+0.5) 
Normal 
approximation 


Study Tip 


In a discrete distribution, 
there is a difference 
between P(x = c) and 
P(x > c). This is true 
because the probability 

= that x is exactly c is not 0. 
In a continuous distribution, however, 
there is no difference between 
P(x = c) and P(x > c) because the 
probability that x is exactly c is 0. 


SECTION 5.5 Normal Approximations to Binomial Distributions 299 


Continuity Correction 


A binomial distribution is discrete and can be represented by a probability 
histogram. To calculate exact binomial probabilities, you can use the binomial 
formula for each value of x and add the results. Geometrically, this corresponds 
to adding the areas of bars in the probability histogram (see top figure at the 
left). Remember that each bar has a width of one unit and x is the midpoint of 
the interval. 

When you use a continuous normal distribution to approximate a binomial 
probability, you need to move 0.5 unit to the left and right of the midpoint to 
include all possible x-values in the interval (see bottom figure at the left). When 
you do this, you are making a continuity correction. 


Using a Continuity Correction 


Use a continuity correction to convert each binomial probability to a normal 
distribution probability. 


1. The probability of getting between 270 and 310 successes, inclusive 
2. The probability of getting at least 158 successes 
3. The probability of getting fewer than 63 successes 


SOLUTION 


1. The discrete midpoint values are 270, 271, . . ., 310. The corresponding 
interval for the continuous normal distribution is 269.5 < x < 310.5 and 
the normal distribution probability is P(269.5 < x < 310.5). 


2. The discrete midpoint values are 158, 159, 160, . . .. The corresponding 
interval for the continuous normal distribution is x > 157.5 and the normal 
distribution probability is P(x > 157.5). 


3. The discrete midpoint values are . . ., 60, 61, 62. The corresponding 
interval for the continuous normal distribution is x < 62.5 and the normal 
distribution probability is P(x < 62.5). 

TRY IT YOURSELF 2 


Use a continuity correction to convert each binomial probability to a normal 
distribution probability. 


1. The probability of getting between 57 and 83 successes, inclusive 


2. The probability of getting at most 54 successes 
Answer: Page A36 


Shown below are several cases of binomial probabilities involving the 
number c and how to convert each to a normal distribution probability. 


Binomial Normal Notes 

Exactly c P(c -05<x<c+t 0.5) Includes c 

At most c P(x <c+ 05) Includes c 

Fewer than c P(x <c— 05) Does not include c 
At least c P(x >c— 05) Includes c 

More than c P(x >c +05) Does not include c 


300 CHAPTER 5. Normal Probability Distributions 


Approximating Binomial Probabilities 


EX 
wo) 


Picturing 
the World 


In a survey of U.S. adults with 
spouses, 34% responded that 
they have hidden purchases from 
their spouses, as shown in the 
pie chart. (Adapted from American 
Association of Retired Persons) 


GUIDELINES 


Using a Normal Distribution to Approximate Binomial Probabilities 


In Words In Symbols 


. Verify that a binomial distribution Specify n, p, and q. 
applies. 

. Determine whether you can use a normal Is np = 5? 
distribution to approximate x, the Is ng = 5? 


binomial variable. 
Have You Ever Hidden 


Purchases from Your Spouse? . Find the mean p and standard deviation o jb = np 


for the distribution. o = Vnpq 

. Apply the appropriate continuity Add 0.5 to (or subtract 
correction. Shade the corresponding 0.5 from) the binomial 
area under the normal curve. probability. 

x — p 


. Find the corresponding z-score(s). <--> 


. Find the probability. Use the Standard 
Normal Table. 


Assume that this survey is a 

true indication of the proportion 
of the population who say they 
have hidden purchases from their 
spouses. You sample 50 adults 
with spouses at random. What is 
the probability that between 20 
and 25, inclusive, would say they 
have hidden purchases from their 
spouses? 


Approximating a Binomial Probability 


In a survey of 8- to 18-year-old heavy media users in the United States, 47% 
said they get fair or poor grades (C and below). You randomly select forty-five 
8- to 18-year-old heavy media users in the United States and ask them whether 
they get fair or poor grades. What is the probability that fewer than 20 of them 
respond yes? (Source: Kaiser Family Foundation) 


SOLUTION 
From Example 1, you know that you can use a normal distribution with 
m=2115 and o = 3.35 


to approximate the binomial distribution. To use a normal distribution, note 
that the probability is “fewer than 20.” So, apply the continuity correction by 
subtracting 0.5 from 20 and write the probability as 


P(x < 20 — 0.5) = P(x < 19.5). 


The figure at the left shows a normal curve with w = 21.15, 0 ~ 3.35, and 
the shaded area to the left of 19.5. The z-score that corresponds to x = 19.5 is 


x— ph 
Z = — 
oO 
., We SSIS 
7 3.35 
~ —0.49. 
Number responding yes Using the Standard Normal Table, 


P(z < —0.49) = 0.3121. 


Interpretation The probability that fewer than twenty 8- to 18-year-olds 
respond yes is approximately 0.3121, or about 31.21%. 


Tech Tip 


Recall that you can use 
technology to find a 
normal probability. 

For instance, in 
Example 4, you can 
use aTI-84 Plus to 

find the probability once the mean, 
standard deviation, and continuity 
correction are calculated. (Use 
10,000 for the upper bound.) 


SECTION 5.5 Normal Approximations to Binomial Distributions 301 


TRY IT YOURSELF 3 


In a survey of adults in the United States, 29% said they have seen a person 
using a mobile device walk in front of a moving vehicle without looking. You 
randomly select 100 adults in the United States and ask them whether they 
have seen a person using a mobile device walk in front of a moving vehicle 
without looking. What is the probability that more than 30 respond yes? 
(Source: Consumer Reports) 

Answer: Page A36 


Approximating a Binomial Probability 

A study on aggressive driving found that 47% of drivers say they have yelled at 
another driver. You randomly select 200 drivers in the United States and ask 
them whether they have yelled at another driver. What is the probability that 


at least 100 drivers will say yes, they have yelled at another driver? (Source: 
American Automobile Association) 


SOLUTION 
Because np = 200(0.47) = 94 and nq = 200(0.53) = 106, the binomial 
variable x is approximately normally distributed, with 

w=np=94 and o = Vapq = V200(0.47) (0.53) ~ 7.06. 


Using the continuity correction, you can rewrite the discrete probability 
P(x = 100) as the continuous probability P(x > 99.5). The figure shows a 
normal curve with « = 94, 0 = 7.06, and the shaded area to the right of 99.5. 


| | 


| f il ne, 
T T T 
71 76 81 86 91 96 101 106 111 116 


Number responding yes 


The z-score that corresponds to 99.5 is 
ee 99.5 — 94 7 
V200(0.47) (0.53) 


So, the probability that at least 100 drivers will say “yes” is approximately 
P(x > 99.5) = P(z > 0.78) 

1 — P(z < 0.78) 

= 1 — 0.7823 

= 0.2177. 


Interpretation The probability that at least 100 drivers will say “yes” is 
approximately 0.2177, or about 21.8%. 


TRY IT YOURSELF 4 


In Example 4, what is the probability that at most 80 drivers will say yes, they 
have yelled at another driver? 
Answer: Page A36 


302 CHAPTER 5. Normal Probability Distributions 


Tech Tip 


The approximation in 
Example 5 is almost 
the same as the 
probability found using 
the binomial probability 
feature of a technology 
tool. For instance, compare the 
result in Example 5 with the one 
found on a TI-84 Plus shown below. 


binomedf tra, .624 
: Ao 7597587 


a 


Approximating a Binomial Probability 


A study of National Football League (NFL) retirees, ages 50 and older, found 
that 62.4% have arthritis. You randomly select 75 NFL retirees who are 
at least 50 years old and ask them whether they have arthritis. What is the 
probability that exactly 48 will say yes? (Source: University of Michigan, Institute 
for Social Research) 


SOLUTION 


Because np = 75(0.624) = 46.8 and ng = 75(0.376) = 28.2, the binomial 
variable x is approximately normally distributed, with 


w= np =468 and o = Vapq = V75(0.624) (0.376) ~ 4.19. 


Using the continuity correction, you can rewrite the discrete probability 
P(x = 48) as the continuous probability P(47.5 < x < 48.5). The figure 
shows a normal curve with w = 46.8,0 ~ 4.19, and the shaded area under the 
curve between 47.5 and 48.5. 


u= 46.8 


Number responding yes 


The z-score that corresponds to 47.5 is 
47.5 — 46.8 
é o 
75 (0.624) (0.376) 


and the z-score that corresponds to 48.5 is 
48.5 — 46.8 
Ve oe 
V75(0.624)(0.376) 


~= 0.41. 


So, the probability that exactly 48 NFL retirees will say they have arthritis is 


P(47.5 < x < 48.5) = P(0.17 < z < 0.41) 
= P(z < 0.41) — P(z < 0.17) 
= 0.6591 — 0.5675 
= 0.0916. 


Interpretation The probability that exactly 48 NFL retirees will say they have 
arthritis is approximately 0.0916, or about 9.2%. 


TRY IT YOURSELF 5 


The study in Example 5 found that 32.0% of all men in the United States 
ages 50 and older have arthritis. You randomly select 75 men in the United 
States who are at least 50 years old and ask them whether they have arthritis. 
What is the probability that exactly 15 will say yes? (Source: University of 
Michigan, Institute for Social Research) 

Answer: Page A36 


SECTION 5.5 Normal Approximations to Binomial Distributions 303 


Hb [ X E A hk | NN [ iN For Extra Help: MyLab Statistics 


Building Basic Skills and Vocabulary 


In Exercises 1-4, the sample size n, probability of success p, and probability of 
failure q are given for a binomial experiment. Determine whether you can use a 
normal distribution to approximate the distribution of x. 


1. n = 24, p = 0.85, q = 0.15 2. n = 15, p = 0.70, g = 0.30 
3. n = 18, p = 0.90, gq = 0.10 4. n = 20, p = 0.65, g = 0.35 


In Exercises 5—8, match the binomial probability statement with its corresponding 
normal distribution probability statement (a)—(d) after a continuity correction. 


5. P(x > 109) (a) P(x > 109.5) 
6. P(x = 109) (b) P(x < 108.5) 
7. P(x < 109) (c) P(x < 109.5) 
8. P(x < 109) (d) P(x > 108.5) 


In Exercises 9-14, write the binomial probability in words. Then, use a continuity 
correction to convert the binomial probability to a normal distribution probability. 


9. P(x < 25) 10. P(x = 110) 1. P(x = 33) 
12. P(x > 65) 13. P(x < 150) 14. P(55 < x < 60) 
Graphical Analysis In Exercises 15 and 16, write the binomial probability 


and the normal probability for the shaded region of the graph. Find the value of 
each probability and compare the results. 


15. >) 
0.24 f 


n=16 


0.20 4 p= 0.4 


0.16 5 


0.12 5 


0.08 5 


0.04 


Using and Interpreting Concepts 


Approximating a Binomial Distribution In Exercises 17 and 18, a 
binomial experiment is given. Determine whether you can use anormal distribution 
to approximate the binomial distribution. If you can, find the mean and standard 
deviation. If you cannot, explain why. 


17. Alcohol-Impaired Driving In a recent year, alcohol-impaired driving was 
the cause of 31% of motor vehicle fatalities. You randomly select 30 motor 
vehicle fatalities and determine whether alcohol-impaired driving was the 
cause. (Source: WalletHub) 


18. Cell Phone and Internet Privileges Sixty-five percent of parents of 
teenagers have taken their teenager’s cell phone or Internet privileges away 
as a punishment. You randomly select 10 parents of teenagers and ask them 
whether they have taken their teenager’s cell phone or Internet privileges 
away as a punishment. (Source: Pew Research Center) 


304 CHAPTER 5 Normal Probability Distributions 


Approximating Binomial Probabilities Jn Exercises 19-26, determine 
whether you can use anormal distribution to approximate the binomial distribution. 
If you can, use the normal distribution to approximate the indicated probabilities 
and sketch their graphs. If you cannot, explain why and use a binomial distribution 
to find the indicated probabilities. Identify any unusual events. Explain. 


19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


Double Charging A survey of cross-border money transfers found that 
in 38% cases, the beneficiary account has suffered double charging, in 
contradiction to European Union rules. You randomly select 200 cross- 
border money transfers. Find the probability that the number who have 
encountered double charging at the beneficiary account is (a) exactly 80, 
(b) at least 80, and (c) fewer than 80. (Source: European Union) 


Online Shopping A survey of German adults found that 73% of customers 
who shop online pay through PayPal. You randomly select 150 German 
adults who shop online. Find the probability that the number who pay 
through PayPal is (a) at least 100, (b) fewer than 100, and (c) more than 
120. (Source: Statista) 


Screen Lock A survey of U.S. adults found that 28% of those who own 
smartphones do not use a screen lock or other security features to access 
their phone. You randomly select 150 U.S. adults who own smartphones. 
Find the probability that the number who do not use a screen lock or other 
security features to access their phone is (a) at most 40, (b) more than 50, 
and (c) between 20 and 30, inclusive. (Source: Pew Research Center) 


Online Account Passwords A survey of U.S. adults found that 39% of 
those who have online accounts use the same or very similar passwords 
for many of their accounts. You randomly select 500 U.S. adults who have 
online accounts. Find the probability that the number who use the same or 
very similar passwords for many of their accounts is (a) exactly 175, (b) no 
more than 225, and (c) at most 200. (Source: Pew Research Center) 


Favorite Sport A survey of U.S. adults found that 33% name professional 
football as their favorite sport. You randomly select 14 U.S. adults and ask 
them to name their favorite sport. Find the probability that the number 
who name professional football as their favorite sport is (a) exactly 8, 
(b) at most 4, and (c) fewer than 6. (Source: The Harris Poll) 


Favorite Beverage <A survey of U.S. adults ages 21 and older who are 
regular drinkers found that 38% name beer as their favorite beverage. You 
randomly select 25 U.S. adults ages 21 and older who are regular drinkers 
and ask them to name their favorite beverage. Find the probability that the 
number who name beer as their favorite beverage is (a) no fewer than 10, 
(b) at least 13, and (c) between 6 and 8, inclusive. (Source: The Harris Poll) 


College Graduates Fifty-one percent of U.S. college graduates consider 
themselves underemployed. You randomly select 250 U.S. college graduates 
and ask them whether they consider themselves underemployed. Find the 
probability that the number who consider themselves underemployed is 
(a) no more than 125, (b) no fewer than 135, and (c) between 100 and 125, 
inclusive. (Source: Accenture) 


College Graduates Fourteen percent of U.S. college graduates want to 
work for a large company. You randomly select 1500 U.S. college graduates 
and ask them whether they want to work for a large company. Find the 
probability that the number who want to work for a large company is 
(a) exactly 175, (b) no more than 225, and (c) at most 200. (Source: Accenture) 


SECTION 5.5 Normal Approximations to Binomial Distributions 305 


27. Minimum Wage About 3.3% of hourly paid U.S. workers earn the 
prevailing minimum wage or less. A grocery chain offers discount rates to 
companies that have at least 30 employees who earn the prevailing minimum 
wage or less. Find the probability that each company will get the discount. 
(Source: U.S. Bureau of Labor Statistics) 

(a) Company A has 700 employees. 
(b) Company B has 750 employees. 


(c) Company C has 1000 employees. 

28. Education A survey of U.S. adults found that 8% believe the biggest 
problem in schools today is poor teaching. You randomly select a sample of 
USS. adults. Find the probability that more than 100 U.S. adults believe the 
biggest problem in schools today is poor teaching. (Source: Rasmussen Reports) 
(a) You select 1000 U.S. adults. 

(b) You select 1250 U.S. adults. 
(c) You select 1500 U.S. adults. 


Extending Concepts 


Getting Physical The figure shows the results of a survey of U.S. adults ages 33 
to 51 who were asked whether they participated in a sport. Seventy percent of U.S. 
adults ages 33 to 51 said they regularly participated in at least one sport, and they 
gave their favorite sport. Use this information in Exercises 29 and 30. 


How adults 
get physical 
Swimming 
(tie) Bicycling, golf 
Hiking a 
= 


(tie) Softball, walking 
Fishing 
Tennis 

(tie) Bowling, running 


Aerobics 


29. You randomly select 250 U.S. adults ages 33 to 51 and ask them whether they 
regularly participate in at least one sport. You find that 60% say no. How 
likely is this result? Do you think this sample is a good one? Explain your 
reasoning. 


30. You randomly select 300 U.S. adults ages 33 to 51 and ask them whether they 
regularly participate in at least one sport. Of the 200 who say yes, 9% say 
they participate in hiking. How likely is this result? Do you think this sample 
is a good one? Explain your reasoning. 


Testing a Drug A drug manufacturer claims that a drug cures a rare skin 
disease 75% of the time. The claim is checked by testing the drug on 100 patients. If 
at least 70 patients are cured, then this claim will be accepted. Use this information 
in Exercises 31 and 32. 


31. Find the probability that the claim will be rejected, assuming that the 
manufacturer’s claim is true. 


32. Find the probability that the claim will be accepted, assuming that the 
actual probability that the drug cures the skin disease is 65%. 


AND | Statistics in the Real World 


Uses 


Normal distributions can be used to describe many real-life situations and are 
widely used in the fields of science, business, and psychology. They are the most 
important probability distributions in statistics and can be used to approximate 
other distributions, such as discrete binomial distributions. 

The most incredible application of the normal distribution lies in the Central 
Limit Theorem. This theorem states that no matter what type of distribution a 
population may have, as long as the size of each random sample is at least 30, the 
distribution of sample means will be approximately normal. When a population 
is normal, the distribution of sample means is normal for any random sample of 
size n. 

The normal distribution is essential to sampling theory. Sampling theory 
forms the basis of statistical inference, which you will study in the next chapter. 


Abuses 


Consider a population that is normally distributed, with a mean of 100 and 
standard deviation of 15. It would not be unusual for an individual value taken 
from this population to be 115 or more. In fact, this will happen almost 16% of 
the time. It would be, however, highly unusual to take a random sample of 100 
values from that population and obtain a sample mean of 115 or more. Because 
the population is normally distributed, the mean of the sampling distribution of 
sample means will be 100, and the standard deviation will be 1.5. A sample mean 
of 115 lies 10 standard deviations above the mean. This would be an extremely 
unusual event. When an event this unusual occurs, it is a good idea to question 
the original parameters or the assumption that the population is normally 
distributed. 

Although normal distributions are common in many populations, you should 
not try to make nonnormal statistics fit a normal distribution. The statistics 
used for normal distributions are often inappropriate when the distribution is 
nonnormal. For instance, some economists argue that financial risk managers’ 
reliance on normal distributions to model stock market behavior is a mistake 
because the normal distributions do not accurately predict unusual events like 
market crashes. 


EXERCISES 


1. Is It Unusual? A population is normally distributed, with a mean of 100 
and a standard deviation of 15. Determine whether either event is unusual. 
Explain your reasoning. 


a. The mean of a sample of 3 is 112 or more. 
b. The mean of a sample of 75 is 105 or more. 

2. Find the Error The mean age of students at a high school is 16.5, with a 
standard deviation of 0.7. You use the Standard Normal Table to determine 


that the probability of selecting one student at random whose age is more than 
17.5 years is about 8%. What is the error in this problem? 


3. Give an example of a distribution that might be nonnormal. 


306 CHAPTER 5. Normal Probability Distributions 


5 Chapter Summary 


What Did You Learn? 


Section 5.1 
» How to interpret graphs of normal probability distributions 


» How to find areas under the standard normal curve 


Section 5.2 


» How to find probabilities for normally distributed variables using a table and 
using technology 


Section 5.3 
» How to find a z-score given the area under the normal curve 
» How to transform a z-score to an x-value 

X= pr Zo 


» How to find a specific data value of a normal distribution given the 
probability 


Section 5.4 
» How to find sampling distributions and verify their properties 


» How to interpret the Central Limit Theorem 


bx = bh Mean of the sample means 
ox = [ Standard deviation of the sample means 
n 
» How to apply the Central Limit Theorem to find the probability of a sample 
mean 
Section 5.5 


» How to determine when a normal distribution can approximate a binomial 
distribution 


= np Mean 
Vnpq Standard deviation 


» How to find the continuity correction 


oO 


» How to use a normal distribution to approximate binomial probabilities 


Chapter Summary 
Review 
Example(s) Exercises 

1,2 1-4 
3-6 5-26 
1-3 27-36 
1,2 37-44 

3 45, 46 
4,5 47-50 

1 51, 52 
2,3 53, 54 
4-6 55-60 

1 61, 62 

2 63-68 
3-5 69, 70 


307 


308 CHAPTER 5. Normal Probability Distributions 


Review Exercises 


Section 5.1 


In Exercises I and 2, use the normal curve to estimate the mean and standard 
deviation. 


1. 


T i x 
5 10 15 20 25 40 45 50 55 60 65 70 75 


In Exercises 3 and 4, use the normal curves shown at the left. 


x 3. Which normal curve has the greatest mean? Explain your reasoning. 


80 90 100 110 120 130 140 ; _ ; 
4. Which normal curve has the greatest standard deviation? Explain your 


FIGURE FOR EXERCISES 3 AND 4 reasoning. 


In Exercises 5 and 6, find the area of the indicated region under the standard 
normal curve. If convenient, use technology to find the area. 


5. 6. 


| i 
T 
0 0.46 -2.35 -08 0 


In Exercises 7-18, find the indicated area under the standard normal curve. 
If convenient, use technology to find the area. 


7. To the left of z = 0.35 8. To the left of z = —1.95 
9. To the right of z = —0.46 10. To the left of z = —1.625 
11. To the right of z = 2.62 12. To the left of z = —0.11 


13. Between z = —1.96 and z = 0 

14. Between z = —1.55 and z = 1.04 

15. Between z = 0.15 and z = 1.64 

16. Between z = —2.68 and z = 2.68 

17. To the left of z = —1.62 and to the right of z = 1.62 
18. To the left of z = 0.64 and to the right of z = 3.415 


The scores for the reading portion of the ACT test are normally distributed. In 
a recent year, the mean test score was 21.3 and the standard deviation was 6.5. 
The test scores of four students selected at random are 17, 29, 8, and 23. Use this 
information in Exercises 19 and 20. (Source: ACT, Inc.) 


19. Find the z-score that corresponds to each value. 


20. Determine whether any of the values are unusual. 


Review Exercises 309 


In Exercises 21-26, find the indicated probability using the standard normal 
distribution. If convenient, use technology to find the probability. 


21. P(z < 1.36) 22. P(z > —0.74) 
23. P(-1.12 < z < 1.75) 24. P(0.42 < z < 3.15) 
25. P(z < —1.50o0rz > 1.50) 26. P(z < Oorz > 1.68) 


Section 5.2 


In Exercises 27-32, the random variable x is normally distributed with mean 
pb = 74 and standard deviation o = 8. Find the indicated probability. 


27. P(x < 82) 28. P(x > 68.4) 
29. P(x > 70) 30. P(x < 57) 
31. P(58 < x < 66) 32. P(72 < x < 82) 


In Exercises 33 and 34, find the indicated probabilities. If convenient, use 
technology to find the probabilities. 


33. Yearly amounts of black carbon emissions from cars in India are normally 
distributed, with a mean of 14.7 gigagrams per year and a standard deviation 
of 11.5 gigagrams per year. Find the probability that the amount of black 
carbon emissions from cars in India for a randomly selected year are 


(a) less than 12.3 gigagrams per year. 

(b) between 15.4 and 19.6 gigagrams per year. 

(c) greater than 17.7 gigagrams per year. (Adapted from Atmospheric Chemistry 
and Physics) 


34. The daily surface concentration of carbonyl sulfide on the Indian Ocean is 
normally distributed, with a mean of 9.1 picomoles per liter and a standard 
deviation of 3.5 picomoles per liter. Find the probability that on a randomly 
selected day, the surface concentration of carbonyl sulfide on the Indian 
Ocean is 


(a) between 5.1 and 15.7 picomoles per liter. 
(b) between 10.5 and 12.3 picomoles per liter. 


(c) more than 11.1 picomoles per liter. (Source: Atmospheric Chemistry and 
Physics) 


35. Determine whether any of the events in Exercise 33 are unusual. Explain 
your reasoning. 


36. Determine whether any of the events in Exercise 34 are unusual. Explain 
your reasoning. 


Section 5.3 


In Exercises 37-42, use the Standard Normal Table or technology to find the 
z-score that corresponds to the cumulative area or percentile. 


37. 0.4364 38. 0.1 39. 0.993 
40. P, 41. Poo 42. Prg 


43. Find the z-score that has 20.9% of the distribution’s area to its right. 


44. Find the positive z-score for which 94% of the distribution’s area lies 
between —z and z. 


310 CHAPTER 5. Normal Probability Distributions 


Braking Distance 
of a Sedan 


w= 132 ft 
o = 4.53 ft 


i it i i 
T T T T T T 
115 120 125 130 135 140 145 
Braking distance (in feet) 


FIGURE FOR EXERCISES 45-50 


On a dry surface, the braking distances (in feet), from 60 miles per hour to a 
complete stop, of a sedan can be approximated by a normal distribution, as shown 
in the figure at the left. Use this information in Exercises 45-50. 


45. Find the braking distance of a sedan that corresponds to z = —2.75. 
1.6. 


46. Find the braking distance of a sedan that corresponds to z 
47. What braking distance of a sedan represents the 90th percentile? 
48. What braking distance of a sedan represents the first quartile? 


49. What is the shortest braking distance of a sedan that can be in the top 15% 
of braking distances? 


50. What is the longest braking distance of a sedan that can be in the bottom 
20% of braking distances? 


Section 5.4 


In Exercises 51 and 52, a population and sample size are given. (a) Find the mean 
and standard deviation of the population. (b) List all samples (with replacement) 
of the given size from the population and find the mean of each. (c) Find the mean 
and standard deviation of the sampling distribution of sample means and compare 
them with the mean and standard deviation of the population. 


51. The goals scored in a season by the four starting defenders on a soccer team 
are 1, 2, 0, and 3. Use a sample size of 2. 


52. The minutes of overtime reported by each of the three executives at a 
corporation are 90, 120, and 210. Use a sample size of 3. 


In Exercises 53 and 54, find the mean and standard deviation of the indicated 
sampling distribution of sample means. Then sketch a graph of the sampling 
distribution. 


53. The per capita electric power consumption level in a recent year in Ecuador 
is normally distributed, with a mean of 471.5 kilowatt-hours and a standard 
deviation of 187.9 kilowatt-hours. Random samples of size 35 are drawn 
from this population, and the mean of each sample is determined. (Source: 
Latin America Journal of Economics) 


54. The test scores for the Law School Admission Test (LSAT) in a recent year 
are normally distributed, with a mean of 155.69 and a standard deviation 
of 5.05. Random samples of size 40 are drawn from this population, and the 
mean of each sample is determined. (Source: Law School Admission Council) 


In Exercises 55—60, find the indicated probabilities and interpret the results. 


55. Refer to Exercise 33. A random sample of 2 years is selected. Find the 
probability that the mean amount of black carbon emissions for the sample 
is (a) less than 12.3 gigagrams per year, (b) between 15.4 and 19.6 gigagrams 
per year, and (c) greater than 17.7 gigagrams per year. Compare your 
answers with those in Exercise 33. 


56. Refer to Exercise 34. A random sample of six days is selected. Find the 
probability that the mean surface concentration of carbonyl sulfide for the 
sample is (a) between 5.1 and 15.7 picomoles per liter, (b) between 10.5 and 
12.3 picomoles per liter, and (c) more than 11.1 picomoles per liter. Compare 
your answers with those in Exercise 34. 


Review Exercises 311 


57. The mean ACT composite score in a recent year is 20.8. A random sample of 
36 ACT composite scores is selected. What is the probability that the mean 
score for the sample is (a) less than 21.6, (b) more than 19.8, and (c) between 
20.5 and 21.5? Assume a0 = 5.6. (Source: ACT, Inc) 


58. The mean MCAT total score in a recent year is 500. A random sample of 
32 MCAT total scores is selected. What is the probability that the mean score 
for the sample is (a) less than 503, (b) more than 502, and (c) between 498 
and 501? Assume a0 = 10.6. (Source: Association of American Medical Colleges) 


59. The mean annual salary for intermediate level life insurance underwriters 
is about $61,000. A random sample of 45 intermediate level life insurance 
underwriters is selected. What is the probability that the mean annual salary 
of the sample is (a) less than $60,000 and (b) more than $63,000? Assume 
o = $11,000. (Adapted from Salary.com) 


60. The mean annual salary for magnetic resonance imaging (MRI) technologists 
is about $72,000. A random sample of 50 MRI technologists is selected. What 
is the probability that the mean annual salary of the sample is (a) less than 
$71,500 and (b) more than $74,500? Assume a = $10,000. (Adapted from 


Salary.com) 


Section 5.5 


In Exercises 61 and 62, a binomial experiment is given. Determine whether you 
can use anormal distribution to approximate the binomial distribution. If you can, 
find the mean and standard deviation. If you cannot, explain why. 


61. A survey of U.S. adults found that 75% support labeling legislation for 
genetically modified organisms (GMOs). You randomly select 20 U.S. 
adults and ask them whether they support labeling legislation for genetically 
modified organisms (GMOs). (Source: The Harris Poll) 


62. A survey of U.S. likely voters found that 11% think Congress is doing a 
good or excellent job. You randomly select 45 U.S. likely voters and ask 
them whether they think Congress is doing a good or excellent job. (Source: 
Rasmussen Reports) 


In Exercises 63-68, write the binomial probability in words. Then, use a continuity 
correction to convert the binomial probability to a normal distribution probability. 


63. P(x = 28) 64. P(x > 18) 65. P(x = 30) 
66. P(x < 32) 67. P(x < 50) 68. P(54 < x < 64) 


In Exercises 69 and 70, determine whether you can use a normal distribution to 
approximate the binomial distribution. If you can, use the normal distribution to 
approximate the indicated probabilities and sketch their graphs. If you cannot, 
explain why and use a binomial distribution to find the indicated probabilities. 


69. A survey of U.S. adults found that 32% have an online account with their 
healthcare provider. You randomly select 70 U.S. adults and ask them 
whether they have an online account with their healthcare provider. Find 
the probability that the number who have an online account with their 
healthcare provider is (a) at most 15, (b) exactly 25, and (c) greater than 30. 
Identify any unusual events. Explain. (Source: Pew Research Center) 


70. Sixty-five percent of U.S. college graduates are employed in their field of 
study. You randomly select 20 U.S. college graduates and ask them whether 
they are employed in their field of study. Find the probability that the number 
who are employed in their field of study is (a) exactly 15, (b) less than 10, and 
(c) between 20 and 35. Identify any unusual events. Explain. (Source: Accenture) 


312 CHAPTER 5. Normal Probability Distributions 


5 Chapter Quiz 


Take this quiz as you would take a quiz in class. After you are done, check your 
work against the answers given in the back of the book. 


1. Find each probability using the standard normal distribution. 


(a) P(z > —1.68) 

(b) P(z < 2.23) 

(c) P(—0.47 < z < 0.47) 

(d) P(z < —1.992 or z > —0.665) 


. The random variable x is normally distributed with the given parameters. 


Find each probability. 

(a) w = 9.2,0 =~ 1.62, P(x < 5.97) 

(b) w = 87,0 ~ 19, P(x > 40.5) 

(c) w= 5.5,0 =~ 0.08, P(5.36 < x < 5.64) 
(d) w = 18.5,0 =~ 4.25, P(19.6 < x < 26.1) 


In a standardized IQ test, scores are normally distributed, with a mean score of 
100 and a standardized deviation of 15. Use this information in Exercises 3-10. 
(Adapted from 123test) 


3. 


10. 


Find the probability that a randomly selected person has an IQ score higher 
than 125. Is this an unusual event? Explain. 


. Find the probability that a randomly selected person has an IQ score 


between 95 and 105. Is this an unusual event? Explain. 


. What percent of the IQ scores are greater than 112? 


. Out of 2000 randomly selected people, about how many would you expect to 


have IQ scores less than 90? 


. What is the lowest score that would still place a person in the top 5% of 


the scores? 


. What is the highest score that would still place a person in the bottom 10% 


of the scores? 


. A random sample of 60 people is selected from this population. What is 


the probability that the mean IQ score of the sample is greater than 105? 
Interpret the result. 


Are you more likely to randomly select one person with an IQ score greater 
than 105 or are you more likely to randomly select a sample of 15 people with 
a mean IQ score greater than 105? Explain. 


In a survey of U.S. adults, 16% say they have had someone take over their email 
accounts without their permission. You randomly select 250 U.S. adults and ask 
them whether they have had someone take over their email accounts without their 
permission. Use this information in Exercises 11 and 12. (Source: Pew Research Center) 


11. 


12. 


Determine whether you can use a normal distribution to approximate the 
binomial distribution. If you can, find the mean and standard deviation. 
If you cannot, explain why. 


Find the probability that the number of U.S. adults who say they have had 
someone take over their email accounts without their permission is (a) at most 
40, (b) less than 45, and (c) exactly 48. Identify any unusual events. Explain. 


Chapter Test 313 


5 Chapter Test 


Take this test as you would take a test in class. 


1. The mean per capita daily water consumption in a village in Bangladesh is 
about 83 liters per person and the standard deviation is about 11.9 liters per 
person. Random samples of size 50 are drawn from this population and the 
mean of each sample is determined. (Source: Journal of Education and Social 
Sciences) 


(a) Find the mean and standard deviation of the sampling distribution of 
sample means. 


(b) What is the probability that the mean per capita daily water consumption 
for a given sample is more than 85 liters per person? 


(c) What is the probability that the mean per capita daily water consumption 
for a given sample is between 80 and 82 liters per person? 


In Exercises 2-4, the random variable x is normally distributed with mean w = 18 
and standard deviation 0 = 7.6. 


2. Find each probability. 

(a) P(x > 20) (b) P(O<x <5) (c) P(x < 9orx > 27) 
3. Find the value of x that has 88.3% of the distribution’s area to its left. 
4. Find the value of x that has 64.8% of the distribution’s area to its right. 


In Exercises 5 and 6, determine whether you can use a normal distribution to 
approximate the binomial distribution. If you can, use the normal distribution to 
approximate the indicated probabilities and sketch their graphs. If you cannot, 
explain why and use a binomial distribution to find the indicated probabilities. 


5. Sixty-nine percent of U.S. college graduates expect to stay at their first 
employer for three or more years. You randomly select 18 U.S. college 
graduates and ask them whether they expect to stay at their first employer for 
three or more years. Find the probability that the number who expect to stay 
at their first employer for three or more years is (a) exactly 10, (b) less than 7, 
and (c) at least 15. Identify any unusual events. Explain. (Source: Accenture) 


6. A survey of U.S. adults found that 86% of those who use the Internet keep 
track of their online passwords in their heads. You randomly select 30 U.S. 
adults who use the Internet. Find the probability that the number who keep 
track of their online passwords in their heads is (a) exactly 25, (b) more 
than 25, and (c) less than 25. Identify any unusual events. Explain. (Source: 
Pew Research Center) 


The per capita disposable income for residents of a U.S. city in a recent year is 
normally distributed, with a mean of about $44,000 and a standard deviation of 
about $2450. Use this information in Exercises 7-10. 


7. Find the probability that the disposable income of a resident is more than 
$45,000. Is this an unusual event? Explain. 


8. Out of 800 residents, about how many would you expect to have a disposable 
income of between $40,000 and $42,000? 


9. Between what two values does the middle 60% of disposable incomes lie? 


10. Random samples of size 8 are drawn from the population and the mean of 
each sample is determined. Is the sampling distribution of sample means 
normally distributed? Explain. 


Putting it all together 


REAL DECISIONS 


You work for a pharmaceuticals company as a Statistical process 
analyst. Your job is to analyze processes and make sure they are 
in statistical control. In one process, a machine is supposed to add 
9.8 milligrams of a compound to a mixture in a vial. (Assume this 
process can be approximated by a normal distribution with a standard 
deviation of 0.05.) The acceptable range of amounts of the compound 
added is 9.65 milligrams to 9.95 milligrams, inclusive. 

Because of an error with the release valve, the setting on the 
machine “shifts” from 9.8 milligrams. To check that the machine is 
adding the correct amount of the compound into the vials, you select 
at random three samples of five vials and find the mean amount of 
the compound added for each sample. A coworker asks why you take 
3 samples of size 5 and find the mean instead of randomly choosing and 
measuring the amounts in 15 vials individually to check the machine’s 
settings. (Note: Both samples are chosen without replacement.) 


EXERCISES 


1. Sampling Individuals 


Assume the machine shifts and the distribution of the amount of the 
compound added now has a mean of 9.96 milligrams and a standard 
deviation of 0.05 milligram. You select one vial and determine how 
much of the compound was added. 


(a) What is the probability that you select a vial that is within the 
acceptable range (in other words, you do not detect that the 
machine has shifted)? (See figure.) 


(b) You randomly select 15 vials. What is the probability that you 
select at least one vial that is within the acceptable range? 


2. Sampling Groups of Five 
Assume the machine shifts and is filling the vials with a mean amount 


of 9.96 milligrams and a standard deviation of 0.05 milligram. You 
select five vials and find the mean amount of compound added. 


(a) What is the probability that you select a sample of five vials that 
has a mean that is within the acceptable range? (See figure.) 


(b) You randomly select three samples of five vials. What is the 
probability that you select at least one sample of five vials that 
has a mean that is within the acceptable range? 


(c) Which is more sensitive to a shift of parameters—an individual 
random selection or a randomly selected sample mean? 


3. Writing an Explanation 


Write a paragraph to your coworker explaining why you take 
3 samples of size 5 and find the mean of each sample instead 
of randomly choosing and measuring the amounts in 15 vials 
individually to check the machine’s setting. 


314 CHAPTER 5. Normal Probability Distributions 


Distribution when 
machine shifts 


Original 


distribution 
of individual 
vials 


Upper limit 
of acceptable 


9.7 9.8 9.9 10.0 10.1 
Masses (in milligrams) 


FIGURE FOR EXERCISE 1 


Mean = 9.96 


Distribution when 
machine shifts 


Upper limit 
of acceptable 


9.7 9.8 9.9 10.0 10.1 
Masses (in milligrams) 


FIGURE FOR EXERCISE 2 


TECHNOLOGY 


Age Distribution in California 


United States” Class Relative 

f en SUS Class midpoint frequency 
eee Bureau a 7 6.9% 
5-9 7 6.9% 
One of the jobs of the U.S. Census Bureau is to keep track of the age 10-14 D 71% 
distribution in the country and in each of the states. The estimated age 45 i aie 
distribution in California is 2016 is shown in the table and the histogram. > aon 
20-24 22 72% 
Age Distribution in California 95-29 7 73% 
4 30-34 32 6.9% 
35-39 37 6.9% 
> 40-44 42 7.0% 

a 
2. 45-49 47 72% 
& 50-54 52 6.9% 
5 55-59 57 6.0% 
4 60-64 62 5.0% 
65-69 67 3.5% 
[| | | | | eel ree he | el ; 70-74 72. 2.6% 
2 7 12 17 22 27 32 37 42 47 52 57 62 67 72 77 82 87 92 97 75-79 77 21% 
Age (in years) 

80-84 82 1.6% 
| 85-89 87 1.0% 
90-94 92 0.4% 
95-99 97 0.1% 


EXERCISES 


7 The means of 36 randomly selected samples generated 3. 


by technology with n = 40 are shown below. 


28.14, 31.56, 36.86, 32.37, 36.12, 39.53, 
36.19, 39.02, 35.62, 36.30, 34.38, 32.98, 
36.41, 30.24, 34.19, 44.72, 38.84, 42.87, 
38.90, 34.71, 34.13, 38.25, 38.04, 34.07, 
39.74, 40.91, 42.63, 35.29, 35.91, 34.36, 


36.51, 36.47, 32.88, 37.33, 31.27, 35.80 5. 


1. Use technology and the age distribution to estimate 
the mean age in California. 


2. Use technology to find the mean of the set of 36 sample 
means. How does it compare with the mean age in 
California found in Exercise 1? Does this agree with the 
result predicted by the Central Limit Theorem? 


Extended solutions are given in the technology manuals that accompany this text. 


Technical instruction is provided for Minitab, Excel, and the TI-84 Plus. 


Are the ages of people in California normally 
distributed? Explain your reasoning. 


. Sketch a relative frequency histogram for the 


36 sample means. Use nine classes. Is the histogram 
approximately bell-shaped and symmetric? Does this 
agree with the result predicted by the Central Limit 
Theorem? 


Use technology and the age distribution to find the 
standard deviation of the ages of people in California. 


. Use technology to find the standard deviation of the 


set of 36 sample means. How does it compare with the 
standard deviation of the ages found in Exercise 5? 
Does this agree with the result predicted by the 
Central Limit Theorem? 


Technology 315 


CHAPTERS 3-5 


316 


1. 


A survey of adults in the United States found that 61% ate at a restaurant 
at least once in the past week. You randomly select 30 adults and ask them 
whether they ate at a restaurant at least once in the past week. (Source: Gallup) 


(a) Verify that a normal distribution can be used to approximate the 
binomial distribution. 

(b) Find the probability that at most 14 adults ate at a restaurant at least 
once in the past week. 


(c) Is it unusual for exactly 14 out of 30 adults to have eaten in a restaurant 
at least once in the past week? Explain your reasoning. 


In Exercises 2 and 3, find the (a) mean, (b) variance, (c) standard deviation, and 
(d) expected value of the probability distribution. Interpret the results. 


Ds 


3. 


5. 


The table shows the distribution of household sizes in the United States for 
a recent year. (Source: U.S. Census Bureau) 


ae 1 2 3 4 5 6 7 
P(x) | 0.281 (0.340 (0.154 | 0.129 0.060 | 0.022 0.013 


The table shows the distribution of personal fouls per game for Garrett 
Temple in a recent NBA season. (Source: National Basketball Association) 


x 0 1 2 3 4 3 6 
P(x) 0.113 0.188 0.188 0.288 0.150 | 0.038 | 0.038 


. Use the probability distribution in Exercise 3 to find the probability of 


randomly selecting a game in which Garrett Temple had (a) fewer than four 
personal fouls, (b) at least three personal fouls, and (c) between two and four 
personal fouls, inclusive. 


From a pool of 16 candidates, 9 men and 7 women, the offices of president, 
vice president, secretary, and treasurer will be filled. (a) In how many 
different ways can the offices be filled? (b) What is the probability that all 
four of the offices are filled by women? 


In Exercises 6-11, find the indicated area under the standard normal curve. 
If convenient, use technology to find the area. 


. To the left of z = 0.72 7. To the left of z = —3.08 
. To the right of z = —0.84 9. Between z = 0 and z = 2.95 
. Between z = —1.22 and z = —0.26 


. To the left of z = 0.12 or to the right of z = 1.72 


. Twenty-eight percent of U.S. adults think that climate scientists understand 


the causes of climate change very well. You randomly select 25 U.S. adults. 
Find the probability that the number of U.S. adults who think that climate 
scientists understand the causes of climate change very well is (a) exactly 
three, (b) between 8 and 11, inclusive, and (c) less than two. (d) Are any of 
these events unusual? Explain your reasoning. (Source: Pew Research Center) 


CHAPTER 5 Normal Probability Distributions 


13. 


14. 


15. 


16. 


17. 


18. 


An auto parts seller finds that 1 in every 200 parts sold is defective. Use the 
geometric distribution to find the probability that (a) the first defective part 
is the fifth part sold, (b) the first defective part is the first, second, or third 
part sold, and (c) none of the first 20 parts sold are defective. 


The table shows the results of a survey in which 3,405,100 public and 
489,900 private school teachers were asked about their full-time teaching 
experience. (Adapted from National Center for Education Statistics) 


Public Private Total 


Less than 3 years 304,650 90,675 395,325 
3 to 9 years 1,127,205 | 145,545 | 1,272,750 
10 to 20 years 1,232,140 128,805 | 1,360,945 
More than 20 years 721,005 99,510 820,515 
Total 3,385,000 | 464,535 | 3,849,535 


(a) Find the probability that a randomly selected private school teacher has 
10 to 20 years of full-time teaching experience. 


(b) Find the probability that a randomly selected teacher is at a public 
school, given that the teacher has 3 to 9 years of full-time experience. 


(c) Are the events “being a public school teacher” and “having more than 
20 years of full-time teaching experience” independent? Explain. 


(d) Find the probability that a randomly selected teacher has 3 to 9 years of 
full-time teaching experience or is at a private school. 


The initial pressures for bicycle tires when first filled are normally distributed, 
with a mean of 70 pounds per square inch (psi) and a standard deviation 
of 1.2 psi. 


(a) Random samples of size 40 are drawn from this population, and the 
mean of each sample is determined. Find the mean and standard 
deviation of the sampling distribution of sample means. Then sketch a 
graph of the sampling distribution. 


(b) A random sample of 15 tires is drawn from this population. What is the 
probability that the mean tire pressure of the sample is less than 69 psi? 


The life spans of car batteries are normally distributed, with a mean of 
44 months and a standard deviation of 5 months. 


(a) Find the probability that the life span of a randomly selected battery is 
less than 36 months. 


(b) Find the probability that the life span of a randomly selected battery is 
between 42 and 60 months. 


(c) What is the shortest life expectancy a car battery can have and still be in 
the top 5% of life expectancies? 


A florist has 12 different flowers from which floral arrangements can be 
made. A centerpiece is made using four different flowers. (a) How many 
different centerpieces can be made? (b) What is the probability that the four 
flowers in the centerpiece are roses, daisies, hydrangeas, and lilies? 


Seventy percent of U.S. adults anticipate major cyberattacks on public 
infrastructure in the next five years. You randomly select 10 U.S. adults. 
(a) Construct a binomial distribution for the random variable x, the number 
of U.S. adults who anticipate major cyberattacks on public infrastructure in 
the next five years (b) Graph the binomial distribution using a histogram and 
describe its shape. (c) Identify any values of the random variable x that you 
would consider unusual. Explain. (Source: Pew Research Center) 


Cumulative Review 317 


CHAPTER 6 


David Wechsler was one of the most influential psychologists of the 20th century. He is 
known for developing intelligence tests, such as the Wechsler Adult Intelligence Scale and 
the Wechsler Intelligence Scale for Children. 


318 


Confidence Intervals for the 
Mean (o- Known) 


Confidence Intervals for the 
Mean (o- Unknown) 


Activity 
Case Study 


Confidence Intervals for 
Population Proportions 


Activity 


Confidence Intervals for 
Variance and Standard 
Deviation 


Uses and Abuses 
Real Statistics—Real Decisions 
Technology 


J Where You ve Been 


In Chapters 1 through 5, you studied descriptive statistics 
(how to collect and describe data) and probability (how 
to find probabilities and analyze discrete and continuous 
probability distributions). For instance, psychologists use 
descriptive statistics to analyze the data collected during 
experiments and tests. 


Ly, Where You re Going 


One of the most commonly administered psychological 
tests is the Wechsler Adult Intelligence Scale. It is an 
intelligence quotient (IQ) test that is standardized to have 
a normal distribution with a mean of 100 and a standard 
deviation of 15. 


In this chapter, you will begin your study of inferential 
statistics—the second major branch of statistics. For 
instance, a chess club wants to estimate the mean IO of 
its members. The mean of a random sample of members 
is 115. Because this estimate consists of a single number 
represented by a point on a number line, it is called a point 
estimate. The problem with using a point estimate is that 
it is rarely equal to the exact parameter (mean, standard 
deviation, or proportion) of the population. 


In this chapter, you will learn how to make a more 
meaningful estimate by specifying an interval of values 
on a number line, together with a statement of how 
confident you are that your interval contains the population 
parameter. Suppose the club wants to be 90% confident 
of its estimate for the mean IO of its members. Here is an 
overview of how to construct an interval estimate. 


Find the mean of 


Find the margin 


Find the interval endpoints. 


arandom sample.| _ of error. Left: 115 — 3.3 = 111.7 
x=115 B=33 Right: 115 + 3.3 = 118.3 
Form the interval estimate. 
111.7 < p< 1183 
111.7 115 118.3 
Wt 12 «3S 70 s—s9 
33 3:3 


So, the club can be 90% confident that the mean IO of its members is between 111.7 and 118.3. 


319 


320 CHAPTER 6 Confidence Intervals 


BL Confidence Intervals for the Mean (o- Known) 


What You Should Learn Estimating Population Parameters m= Confidence Intervals for a 
Population Mean m Sample Size 


» How to find a point estimate 
and a margin of error 


Estimating Population Parameters 


confidence intervals for a In this chapter, you will learn an important technique of statistical inference —to 
tele ela uae Toe ae Inieat use sample statistics to estimate the value of an unknown population parameter. 
Bhown In this section and the next, you will learn how to use sample statistics to make an 

~ How to determine the minimum estimate of the population parameter jz. when the population standard deviation 
Scluel eT SEE LNC (ol Nala ty a is known (this section) or when a is unknown (Section 6.2). To make such an 
estimating # population team inference, begin by finding a point estimate. 


~ How to construct and interpret 


DEFINITION 


A point estimate is a single value estimate for a population parameter. The 


most unbiased point estimate of the population mean p is the sample mean xX. 


The validity of an estimation method is increased when you use a sample 
statistic that is unbiased and has low variability. A statistic is unbiased if it does 
not overestimate or underestimate the population parameter. In Chapter 5, you 
learned that the mean of all possible sample means of the same size equals the 
population mean. As a result, x is an unbiased estimator of . When the standard 
error a / Vn of a sample mean is decreased by increasing n, it becomes less 
variable. 


Finding a Point Estimate 


Nimber oubeus A researcher is collecting data about a college athletic conference and its 


19 25 15 21 22 20 20 22 student-athletes. A random sample of 40 student-athletes is selected and 
22 21 21 23 22 16 21 18 their numbers of hours spent on required athletic activities for one week 
25 23 23 21 22 24 18 19 are recorded (see table at left). Find a point estimate for the population 


23 20 19 19 24 25 17 21 


mean p, the mean number of hours spent on required athletic activities by all 
21 25 23 18 22 20 21 21 


student-athletes in the conference. (Adapted from Penn Schoen Berland) 


SOLUTION 
The sample mean of the data is 
x 842 
X= — = —~ & 21.1. 
- n 40 


So, the point estimate for the mean number of hours spent on required athletic 
activities by all student-athletes in the conference is about 21.1 hours. 


Number of hours TRY IT YOURSELF 1 
1°17 21 «18 23 25 In Example 1, the researcher selects a second random sample of 30 
22 21 23 22 19 20 student-athletes and records their numbers of hours spent on required athletic 
20 23 20 22 21 19 activities (see table at left). Use this sample to find another point estimate of 
20 21 23 16 19 20 the population mean pw. (Adapted from Penn Schoen Berland) 


21 20 22 19 24 24 Answer: Page A36 


Study Tip 

In this text, you will usually 
use 90%, 95%, and 99% 
levels of confidence. Here 
are the z-scores that 
correspond to these levels 
of confidence. 


Level of 

Confidence Za 
90% 1.645 
95% 1.96 
99% 2575 


SECTION 6.1. Confidence Intervals for the Mean (a Known) 321 


In Example 1, the probability that the population mean is exactly 21.1 
is virtually zero. So, instead of estimating m to be exactly 21.1 using a point 
estimate, you can estimate that w lies in an interval. This is called making an 
interval estimate. 


DEFINITION 


An interval estimate is an interval, or range of values, used to estimate a 


population parameter. 


Although you can assume that the point estimate in Example 1 is not equal 
to the actual population mean, it is probably close to it. To form an interval 
estimate, use the point estimate as the center of the interval, and then add and 
subtract a margin of error. For instance, if the margin of error is 0.6, then an 
interval estimate would be given by 


21.1 + 0.6 or 20.5 < w < 21.7. 
The point estimate and interval estimate are shown in the figure. 


Left endpoint Point estimate Right endpoint 


20.5 ¥=211 217 
es | 
20 20.5 21 21.5 22 


Interval Estimate 


Before finding a margin of error for an interval estimate, you should first 
determine how confident you need to be that your interval estimate contains the 
population mean wp. 


DEFINITION 


The level of confidence c is the probability that the interval estimate contains 


the population parameter, assuming that the estimation process is repeated a 
large number of times. 


You know from the Central Limit Theorem that when n = 30, the sampling 
distribution of sample means approximates a normal distribution. The level of 
confidence c is the area under the standard normal curve between the critical 
values, —z,. and z,. Critical values are values that separate sample statistics that 
are probable from sample statistics that are improbable, or unusual. You can see 
from the figure shown below that c is the percent of the area under the normal 
curve between —z, and z,. The area remaining is 1 — c, so the area in one tail is 


41 - ec). 


For instance, if c = 90%, then 5% of the area lies to the left of —z, = —1.645 
and 5% lies to the right of z, = 1.645, as shown in the table. 


Area in one tail 


Ifc = 90%: 
c = 0.90 Area in blue region 
1—c=010 Area in yellow 


regions 
5(1 —c) = 0.05 | Area in one tail 


Critical value 


—zZ, = —1.645 separating left tail 


Critical value 


fe OS separating right tail 


322 CHAPTER 6 Confidence Intervals 


FON The difference between the point estimate and the actual parameter value 
ee S&S is called the sampling error. When yp is estimated, the sampling error is the 


eee) Picturing 
the World 


A survey of a random sample of 
1000 smartphone owners found 
that the mean daily time spent 
communicating on a smartphone 
was 131.4 minutes. From previous 
studies, it is assumed that the 
population standard deviation 

is 21.2 minutes. Communicating 
on a smartphone includes text, 
email, social media, and phone 
calls. (Adapted from International Data 
Corporation) 


difference x — yw. In most cases, of course, ~ is unknown, and X varies from 
sample to sample. However, you can calculate a maximum value for the error 
when you know the level of confidence and the sampling distribution. 


DEFINITION 


Given a level of confidence c, the margin of error E (sometimes also called 
the maximum error of estimate or error tolerance) is the greatest possible 
distance between the point estimate and the value of the parameter it is 
estimating. For a population mean yp where a is known, the margin of error is 


o 
E = 2.0% = %— Margin of error for w (o known) 


Vn 


when these conditions are met. 


1. The sample is random. 


2. At least one of the following is true: The population is normally 
distributed or n = 30. (Recall from the Central Limit Theorem that when 
n = 30, the sampling distribution of sample means approximates a normal 
distribution.) 


Daily Time Spent 
on Smartphone 


Frequency 


Finding the Margin of Error 


Use the data in Example 1 and a 95% confidence level to find the margin of 


¢ ¢ 22 # 4 error for the mean number of hours spent on required athletic activities by all 
i student-athletes in the conference. Assume the population standard deviation 
Minutes . 
is 2.3 hours. 
For a 95% confidence interval, 
SOLUTION 


what would be the margin of error 
for the population mean daily 
time spent communicating on a 
smartphone? 


Because a is known (ao = 2.3), the sample is random (see Example 1), and 
n = 40 = 30, use the formula for E given above. The z-score that corresponds 
to a 95% confidence level is 1.96. This implies that 95% of the area under 
the standard normal curve falls within 1.96 standard deviations of the mean, 
as shown in the figure below. (You can approximate the distribution of the 
sample means with a normal curve by the Central Limit Theorem because 
n = 40 = 30.) 

Using the values z, = 1.96, 
o = 2.3, andn = 40, 


0.95 


E= z= 
s/n 
23 
= 1.96 -—— 
/40 0.025 0.025 
= OFF. z 


—Z, =—1.96 z=0 Z, = 1.96 


Interpretation You are 95% confident that the margin of error for the 
population mean is about 0.7 hour. 


TRY IT YOURSELF 2 


Use the data in Try It Yourself 1 and a 95% confidence level to find the margin 
of error for the mean number of hours spent on required athletic activities 
by all student-athletes in the conference. Assume the population standard 
deviation is 2.3 hours. Answer: Page A36 


Study Tip 


When you construct 
a confidence interval 
for a population mean, the general 
round-off rule is to round off to the 
same number of decimal places as 
the sample mean. 


* Study Tip 
Other ways to represent a 
confidence interval are 
(X-— E,X+ E) andx + E. 
For instance, in Example 3, 
you could write the 
confidence interval as 
(20.4, 21.8) or 21.1 + 0.7. 


SECTION 6.1. Confidence Intervals for the Mean (a Known) 323 


Confidence Intervals for a Population Mean 


Using a point estimate and a margin of error, you can construct an interval 
estimate of a population parameter such as mw. This interval estimate is called a 
confidence interval. 


DEFINITION 


A c-confidence interval for a population mean j is 
x-E<p< xt. 


The probability that the confidence interval contains p is c, assuming that the 
estimation process is repeated a large number of times. 


GUIDELINES 


Constructing a Confidence Interval for a Population Mean (o0 Known) 
In Words In Symbols 


. Verify that o is known, the sample 
is random, and either the population 
is normally distributed or n = 30. 
yx 


. Find the sample statistics n and X. x= 
n 


. Find the critical value z, that corresponds Use Table 4 in Appendix B. 
to the given level of confidence. 


; : o 
. Find the margin of error E. E=2Z Va 
. Find the left and right endpoints Left endpoint: x — E 
and form the confidence interval. Right endpoint: x + E 
Interval: x¥ -E<pw<x+E 


See Minitab steps 
on page 366. 


Constructing a Confidence Interval 


Use the data in Example 1 to construct a 95% confidence interval for the mean 
number of hours spent on required athletic activities by all student-athletes in 
the conference. 


SOLUTION 


In Examples 1 and 2, you found that ¥ ~ 21.1 and E ~ 0.7. The confidence 
interval is constructed as shown. 


Left Endpoint Right Endpoint 
x — E = 21.1 — 0.7 x +E = 21.1 + 0.7 
= 20.4 = 21.8 
ee 20.4 <p < 21.8 je 
20.4 a11 21.8 
20.0 20.5 21.0 21.5 22 


Interpretation With 95% confidence, you can say that the population mean 
number of hours spent on required athletic activities is between 20.4 and 
21.8 hours. 


324 CHAPTER 6 Confidence Intervals 


Study Tip 


The width of a confidence 
interval is 2E. Examine 
the formula for E to see 
why a larger sample size 
tends to give you a 

= “narrower confidence 
interval for the same level of 
confidence. 


Tech Tip 


Here are instructions 
for constructing a 
confidence interval on 
a TI-84 Plus. First, either 
enter the original data 
into a list or enter the 
descriptive statistics. 


STAT 


Choose the TESTS menu. 
7: Zinterval... 


Select the Data input option when 
you use the original data. Select the 
Stats input option when you use 

the descriptive statistics. In each 
case, enter the appropriate values, 
then select Calculate. Your results 
may differ slightly depending on the 
method you use. For Example 4, the 
original data were entered. 


elnkervd 
112,21, 9873 
AS 


za, 
=71 
Eoroer roots 


© 
e 
5 
ri 


TRY IT YOURSELF 3 


Use the data in Try It Yourself 1 to construct a 95% confidence interval 
for the mean number of hours spent on required athletic activities by all 
student-athletes in the conference. Compare your result with the interval 
found in Example 3. Answer: Page A36 


Constructing a Confidence Interval Using Technology 


Use the data in Example 1 and technology to construct a 99% confidence 
interval for the mean number of hours spent on required athletic activities by 
all student-athletes in the conference. 


SOLUTION 


Minitab and StatCrunch each have features that allow you to construct a 
confidence interval. You can construct a confidence interval by entering the 
original data or by using the descriptive statistics. The original data was used 
to construct the confidence intervals shown below. From the displays, a 99% 
confidence interval for w is (20.1, 22.0). Note that this interval is rounded to 
the same number of decimals places as the sample mean. 


MINITAB 


One-Sample Z: Hours 


The assumed standard deviation = 2.3 


Variable N Mean StDev SE Mean 99% Cl 
Hours 40) 24| (0/e{0) 2.438 0.364 (20.113, 21.987) 


One sample Z confidence interval: 
uy: Mean of variable 
Standard deviation = 2.3 


99% confidence interval results: 


Variable n Sample Mean __ Std. Err. L. Limit U. Limit 
Hours 40 21.05 0.36366193 20.113269 21.986731 


Interpretation With 99% confidence, you can say that the population mean 
number of hours spent on required athletic activities is between 20.1 and 
22.0 hours. 


TRY IT YOURSELF 4 


Use the data in Example 1 and technology to construct 75%, 85%, and 90% 
confidence intervals for the mean number of hours spent on required athletic 
activities by all student-athletes in the conference. How does the width of the 
confidence interval change as the level of confidence increases? 

Answer: Page A36 


In Examples 3 and 4, and Try It Yourself 4, the same sample data were used 
to construct confidence intervals with different levels of confidence. Notice that 
as the level of confidence increases, the width of the confidence interval also 
increases. In other words, when the same sample data are used, the greater the 
level of confidence, the wider the interval. 


' Tech Tip 


Here are instructions 
for constructing a 
confidence interval 

in Excel. First, click 
Formulas at the top of 
the screen and click 
Insert Function in the Function 
Library group. Select the category 
Statistical and select the 
Confidence.Norm function. In the 
dialog box, enter the values of 
alpha, the standard deviation, and 
the sample size (see below). Then 
click OK. The value returned is the 
margin of error, which is used to 
construct the confidence interval. 


* 
a 


A B 
4 | CONFIDENCE.NORM(O. 1,1.5,20) 
2 0.551700678 


Alpha is the /evel of significance, 
which will be explained in 
Chapter 7 When using Excel in 
Chapter 6, you can think of alpha 
as the complement of the level 
of confidence. So, for a 90% 
confidence interval, alpha is equal 
to 1 — 0.90 = 0.10. 


Lu 


The horizontal segments represent 
90% confidence intervals for 
different samples of the same size. 
In the long run, 9 of every 10 such 
intervals will contain jw. 


SECTION 6.1 Confidence Intervals for the Mean (a Known) 325 


For a normally distributed population with o known, you may use the 
normal sampling distribution for any sample size (even when n < 30), as shown 
in Example 5. 


See TI-84 Plus 
steps on page 367. 


Constructing a Confidence Interval 


A college admissions director wishes to estimate the mean age of all students 
currently enrolled. In a random sample of 20 students, the mean age is 
found to be 22.9 years. From past studies, the standard deviation is known 
to be 1.5 years, and the population is normally distributed. Construct a 90% 
confidence interval for the population mean age. 


SOLUTION 


Because o is known, the sample is random, and the population is normally 
distributed, use the formula for F given in this section. Using n = 20, x = 22.9, 
o = 1.5, and z, = 1.645, the margin of error at the 90% confidence level is 


Oo 
E=z-— 
oe 
1.5 
= 1.645-—— 
V20 
= 0.6. 


The 90% confidence interval can be written as ¥ + E ~ 22.9 + 0.6 or as 
shown below. 


Left Endpoint Right Endpoint 
x — E = 22.9 — 0.6 xX+ E = 22.9 + 0.6 
= 22.3 = 23.5 
22.3, 22.9 235 
22.0 22.5 23.0 23.5 24.0 


Interpretation With 90% confidence, you can say that the mean age of all the 
students is between 22.3 and 23.5 years. 


TRY IT YOURSELF 5 


Construct a 90% confidence interval for the population mean age for the 
college students in Example 5 with the sample size increased to 30 students. 
Compare your answer with Example 5. Answer: Page A36 


After constructing a confidence interval, it is important that you interpret 
the results correctly. Consider the 90% confidence interval constructed in 
Example 5. Because wp is a fixed value predetermined by the population, it is 
either in the interval or not. It is not correct to say, “There is a 90% probability 
that the actual mean will be in the interval (22.3, 23.5).” This statement is wrong 
because it suggests that the value of w can vary, which is not true. The correct way 
to interpret this confidence interval is to say, “With 90% confidence, the mean 
is in the interval (22.3, 23.5).” This means that when a large number of samples 
is collected and a confidence interval is created for each sample, approximately 
90% of these intervals will contain jw, as shown in the figure at the left. This 
correct interpretation refers to the success rate of the process being used. 


326 CHAPTER 6 Confidence Intervals 


Sample Size 


For the same sample statistics, as the level of confidence increases, the confidence 
interval widens. As the confidence interval widens, the precision of the estimate 
decreases. One way to improve the precision of an estimate without decreasing 
the level of confidence is to increase the sample size. But how large a sample size 
is needed to guarantee a certain level of confidence for a given margin of error? 
By using the formula for the margin of error 


oO 
Eo Fee 


a formula can be derived (see Exercise 59) to find the minimum sample size n, as 
shown in the next definition. 


Finding a Minimum Sample Size to Estimate pv 


Given a c-confidence level and a margin of error F, the minimum sample size 
n needed to estimate the population mean p is 


_ (22) 
n E s 


If n is not a whole number, then round v up to the next whole number (see 
Example 6). Also, when o is unknown, you can estimate it using s, provided 
you have a preliminary random sample with at least 30 members. 


Determining a Minimum Sample Size 


The researcher in Example 1 wants to estimate the mean number of hours 
spent on required athletic activities by all student-athletes in the conference. 
How many student-athletes must be included in the sample to be 95% 
confident that the sample mean is within 0.5 hour of the population mean? 


SOLUTION 


Using c = 0.95, z. = 1.96, o = 2.3 (from Example 2), and E = 0.5, you can 
solve for the minimum sample size n. 


N 


81.29. 


Because n is not a whole number, round up to 82. So, the researcher needs at 
least 82 student-athletes in the sample. 


Interpretation The researcher already has 40 student-athletes, so the 
sample needs 42 more members. Note that 82 is the minimum number of 
student-athletes to include in the sample. The researcher could include more, 
if desired. 


TRY IT YOURSELF 6 


In Example 6, how many student-athletes must the researcher include in the 
sample to be 95% confident that the sample mean is within 0.75 hour of the 
population mean? Compare your answer with Example 6. 

Answer: Page A36 


SECTION 6.1 Confidence Intervals for the Mean (a Known) 327 


6. | E X E A [ | 5 E " For Extra Help: MyLab Statistics | 


Building Basic Skills and Vocabulary 


1. When estimating a population mean, are you more likely to be correct when 
you use a point estimate or an interval estimate? Explain your reasoning. 


2. Which statistic is the best unbiased estimator for jw? 
(a) s (b) x (c) the median (d) the mode 
3. For the same sample statistics, which level of confidence would produce the 
widest confidence interval? Explain your reasoning. 
(a) 90% (b) 95% (c) 98% (d) 99% 
4. You construct a 95% confidence interval for a population mean using 


a random sample. The confidence interval is 24.9 < w < 31.5. Is the 
probability that yw is in this interval 0.95? Explain. 


In Exercises 5-8, find the critical value z, necessary to construct a confidence 
interval at the level of confidence c. 


5. c = 0.80 6. c = 0.85 7. c = 0.75 8. c = 0.97 


Graphical Analysis Jn Exercises 9-12, use the values on the number line to 
find the sampling error. 


9. ¥=38 pw =4.27 10. w=8.76 8 =9x=95 
~ e ° >x ~ o e >xX 
3.4 3.6 38 40 42 44 46 8.6 88 9.0 9.2 94 96 9.8 
11. M=2467 x=26.43 12. X¥=46.56 pw =48.12 
“~ o >xX ~< fe ° >X 
24 25 26 27 46 47 48 49 


In Exercises 13-16, find the margin of error for the values of c, a, and n. 

13. c = 0.95,0 = 5.2,n = 30 14. c = 0.90, 0 = 2.9,n = 50 

15. c = 0.80,0 = 1.3,n = 75 16. c = 0.975, o = 4.6,n = 100 
Matching Jn Exercises 17-20, match the level of confidence c with the 


appropriate confidence interval. Assume each confidence interval is constructed 
for the same sample statistics. 


17. c = 0.88 18. c = 0.90 19. c = 0.95 20. c = 0.98 
(a) 54.9 570 59.5 (b) 552 5759.2 
54 55 56 57 58 59 60 ; 54 55 56 57 58 59 60 , 
(c) 55.6 57) 588 (d) 555 qn 58.9 
et Ht pt 
54 55 56 57: 58 59 60 . 54 55 56 57 58 59 60 ; 


In Exercises 21-24, construct the indicated confidence interval for the population 
mean |. 


21. c = 0.90, ¥ = 12.3,0 = 1.5,n = 50 
22. c = 0.95, xX = 31.39,0 = 0.80,n = 82 
23. c = 0.99, x = 10.50, 0 = 2.14,n = 45 
24. c = 0.80, X = 20.6, 0 = 4.7,n = 100 


328 CHAPTER 6 Confidence Intervals 


In Exercises 25-28, use the confidence interval to find the margin of error and the 
sample mean. 


25. (12.0, 14.8) 26. (21.61, 30.15) 
27. (1.71, 2.05) 28. (3.144, 3.176) 


In Exercises 29-32, determine the minimum sample size n needed to estimate yw for 
the values of c, a, and E. 


29. c = 0.90,0 = 6.8, FE = 1 30. c = 0.95,0 = 2.5,E = 1 
31. c = 0.80,0 = 4.1, EF =2 32. c = 0.98,0 = 10.1, EF = 2 


Using and Interpreting Concepts 


Finding the Margin of Error Jn Exercises 33 and 34, use the confidence 
interval to find the estimated margin of error. Then find the sample mean. 


33. Commute Times A government agency reports a confidence interval of 
(26.2, 30.1) when estimating the mean commute time (in minutes) for the 
population of workers in a city. 


34. Book Prices A store manager reports a confidence interval of 
(244.07, 280.97) when estimating the mean price (in dollars) for the 
population of textbooks. 


Constructing Confidence Intervals Jn Exercises 35-38, you are given 
the sample mean and the population standard deviation. Use this information to 
construct 90% and 95% confidence intervals for the population mean. Interpret 
the results and compare the widths of the confidence intervals. 


35. Oil Prices From a random sample of 48 business days from November 
14, 2017 through January 23, 2018, London’s crude oil prices had a mean 
of $59.23. Assume the population standard deviation is $2.79. (Source: Live 
Charts UK) 


36. Stock Prices From a random sample of 36 business days from January 
12, 2017 through January 12, 2018, the mean closing price of Egyptian iron 
and steel stock was 5.62 Egyptian pound. Assume the population standard 
deviation is 1.91 EGP. (Source: Cairo Stock Exchange) 


37. Maximum Daily Rainfall From a random sample of 64 dates, the mean 
record of high daily rainfall in Changi (Singapore) area has a mean of 
10.22 mm. Assume the population standard deviation is 15.77 mm. (Source: 
Meteorological Service Singapore) 


38. Fire Accidents per Year From a random sample of 24 years from 1923 
through 2004, the mean number of fire accidents per year in Japan was about 
43.202. Assume the population standard deviation is 20,469. (Source: Statistics 
Bureau of Japan) 


39. In Exercise 35, does it seem possible that the population mean could equal 
the sample mean? Explain. 


40. In Exercise 36, does it seem possible that the population mean could be 
within 1% of the sample mean? Explain. 


41. In Exercise 37, does it seem possible that the population mean could be 
greater than 14 millimeters? Explain. 


42. In Exercise 38, does it seem possible that the population mean could be less 
than 35,000? Explain. 


SECTION 6.1. Confidence Intervals for the Mean (a Known) 329 


43. When all other quantities remain the same, how does the indicated change 
affect the width of a confidence interval? Explain. 


(a) Increase in the level of confidence 
(b) Increase in the sample size 
(c) Increase in the population standard deviation 


44. Describe how you would construct a 90% confidence interval to estimate the 
population mean age for students at your school. 


Constructing Confidence Intervals Jn Exercises 45 and 46, use the 
information to construct 90% and 99% confidence intervals for the population 
mean. Interpret the results and compare the widths of the confidence intervals. 


BG 45. Sodium in Branded Cereals A group of researchers calculates the 
mean quantity of sodium (in milligrams) in selected branded cereals 
consumed by people in each serving. To do so, the group takes a 
random sample of 30 branded cereals and obtain the quantity (in 
milligrams) below. 


130 15 260 140 200 180 125 210 200 210 220 290 210 140 180 
280 290 90 180 140 80 220 140 190 125 200 0 160 240 135 


From past studies, the research council assumes that o is 70.7 milligrams. 
(Adapted from the Startcrunch Surveys) 


lad} 46. Sodium Chloride Concentrations The sodium chloride concentrations 
(in grams per liter) for 36 randomly selected seawater samples are 
listed. Assume that o is 7.61 grams per liter. 


30.63 33.47 26.76 15.23 13.21 10.57 
16.57 27.32 27.06 15.07 28.98 34.66 
10.22 22.43 17.33 28.40 35.70 14.09 
11.77 33.60 27.09 26.78 22.39 30.35 
11.83 13.05 22.22 13.45 18.86 24.92 
32.86 31.10 18.84 10.86 15.69 22.35 


47. Determining a Minimum Sample Size Determine the minimum sample 
size required when you want to be 95% confident that the sample mean is 
within one unit of the population mean and 0 = 4.8. Assume the population 
is normally distributed. 


48. Determining a Minimum Sample Size Determine the minimum sample size 
required when you want to be 99% confident that the sample mean is within 
two units of the population mean and o = 1.4. Assume the population is 
normally distributed. 


49. Cholesterol Contents of Cheese A cheese processing company wants to 
estimate the mean cholesterol content of all one-ounce servings of a type of 
cheese. The estimate must be within 0.75 milligram of the population mean. 


(a) Determine the minimum sample size required to construct a 95% 
confidence interval for the population mean. Assume the population 
standard deviation is 3.10 milligrams. 


(b) The sample mean is 29 milligrams. Using the minimum sample size with 
a 95% level of confidence, does it seem possible that the population 
mean could be within 3% of the sample mean? within 0.3% of the 
sample mean? Explain. 


330 CHAPTER 6 Confidence Intervals 


Error tolerance = 0.5 0z 


\ igalll 
Volume = 1 gal (128 oz) 


FIGURE FOR EXERCISE 51 


Error tolerance = 0.25 fl oz 


Volume = 1/2 gal (64 fl oz) 


FIGURE FOR EXERCISE 52 


50. 


51. 


52. 


33s 


54. 


Ages of College Students An admissions director wants to estimate 
the mean age of all students enrolled at a college. The estimate must be 
within 1.5 years of the population mean. Assume the population of ages is 
normally distributed. 


(a) Determine the minimum sample size required to construct a 90% 
confidence interval for the population mean. Assume the population 
standard deviation is 1.6 years. 


(b) The sample mean is 20 years of age. Using the minimum sample size 
with a 90% level of confidence, does it seem possible that the population 
mean could be within 7% of the sample mean? within 8% of the sample 
mean? Explain. 


Paint Can Volumes A paint manufacturer uses a machine to fill gallon cans 
with paint (see figure at the left). The manufacturer wants to estimate the 
mean volume of paint the machine is putting in the cans within 0.5 ounce. 
Assume the population of volumes is normally distributed. 


(a) Determine the minimum sample size required to construct a 90% 
confidence interval for the population mean. Assume the population 
standard deviation is 0.75 ounce. 


(b) The sample mean is 127.75 ounces. With a sample size of 8, a 90% level 
of confidence, and a population standard deviation of 0.75 ounce, does 
it seem possible that the population mean could be exactly 128 ounces? 
Explain. 


Juice Dispensing Machine A beverage company uses a machine to fill 
half-gallon bottles with fruit juice (see figure at the left). The company wants 
to estimate the mean volume of water the machine is putting in the bottles 
within 0.25 fluid ounce. 


(a) Determine the minimum sample size required to construct a 95% 
confidence interval for the population mean. Assume the population 
standard deviation is 1 fluid ounce. 


(b) The sample mean is exactly 64 fluid ounces. With a sample size of 68, a 
95% level of confidence, and a population standard deviation of 1 fluid 
ounce, does it seem possible that the population mean could be greater 
than 63.85 fluid ounces? Explain. 


Soccer Balls A soccer ball manufacturer wants to estimate the mean 
circumference of soccer balls within 0.15 inch. 


(a) Determine the minimum sample size required to construct a 99% 
confidence interval for the population mean. Assume the population 
standard deviation is 0.5 inch. 


(b) The sample mean is 27.5 inches. With a sample size of 84, a 99% level 
of confidence, and a population standard deviation of 0.5 inch, does it 
seem possible that the population mean could be less than 27.6 inches? 
Explain. 


Tennis Balls A tennis ball manufacturer wants to estimate the mean 
circumference of tennis balls within 0.05 inch. Assume the population of 
circumferences is normally distributed. 


(a) Determine the minimum sample size required to construct a 99% 
confidence interval for the population mean. Assume the population 
standard deviation is 0.10 inch. 


(b) The sample mean is 8.3 inches. With a sample size of 34, a 99% level 
of confidence, and a population standard deviation of 0.10 inch, does it 
seem possible that the population mean could be exactly 8.258 inches? 
Explain. 


SECTION 6.1. Confidence Intervals for the Mean (a Known) 331 


55. When estimating the population mean, why not construct a 99% confidence 


56. 


interval every time? 

When all other quantities remain the same, how does the indicated change 
affect the minimum sample size requirement? Explain. 

(a) Increase in the level of confidence 

(b) Increase in the error tolerance 

(c) Increase in the population standard deviation 


Extending Concepts 


Finite Population Correction Factor Jn Exercises 57 and 58, use the 
information below. 


In this section, you studied the construction of a confidence interval to 
estimate a population mean. In each case, the underlying assumption was that 
the sample size n was small in comparison to the population size N. When 
n = 0.05N, however, the formula that determines the standard error of the 
mean o; needs to be adjusted, as shown below. 


2 
°F VnNN-1 


Recall from the Section 5.4 exercises that the expression V(N —n)/(N-1) 
is called a finite population correction factor. The margin of error is 


N-n 


Oo 
E= z, . 
Ty N-1 


57. Determine the finite population correction factor for each value of N and n. 


58. 


59. 


(a) N = 1000 andn = 500 (b) N = 1000 andn = 100 
(c) N = 1000 andn = 75 (d) N = 1000 andn = 50 
(e) N = 100 andn = 50 (f) N= 400 andn = 50 

(g) N = 700 andn = 50 (h) N = 1200 andn = 50 


What happens to the finite population correction factor as the sample size n 
decreases but the population size N remains the same? as the population 
size N increases but the sample size n remains the same? 


Use the finite population correction factor to construct each confidence 
interval for the population mean. 

(a) c = 0.99, ¥ = 8.6,0 = 4.9, N = 200,n = 25 

(b) c = 0.90, ¥ = 10.9, 0 = 2.8, N = 500, n = 50 

(c) c = 0.95, ¥ = 40.3, 0 = 0.5, N = 300, n = 68 

(d) c = 0.80, x = 56.7,0 = 9.8, N = 400, n = 36 


Sample Size The equation for determining the sample size 


(5 : ) 

n= | — 

E 

can be obtained by solving the equation for the margin of error 


ZO 
Vn 


for n. Show that this is true and justify each step. 


E= 


332 CHAPTER 6 Confidence Intervals 


What You Should Learn 


» How to interpret the 
tdistribution and use a 
t-distribution table 

~ How to construct and interpret 
confidence intervals for a 
population mean when a is not 
known 


Study Tip 


Here is an example that 
illustrates the concept of 
degrees of freedom. 


The number of chairs in 

a classroom equals 

the number of students: 
25 chairs and 25 students. Each of 
the first 24 students to enter the 
classroom has a choice on which 
chair he or she will sit. There is no 
freedom of choice, however, for the 
25th student who enters the room. 


iWwam Confidence Intervals for the Mean (o- Unknown) 


The t-Distribution m Confidence Intervals and t-Distributions 


The f-Distribution 


In many real-life situations, the population standard deviation is unknown. 
So, how can you construct a confidence interval for a population mean when 
o is not known? For a simple random sample that is drawn from a population 
that is normally distributed or has a sample size of 30 or more, you can use the 
sample standard deviation s to estimate the population standard deviation o. 
However, when using s, the sampling distribution of x does not follow a normal 
distribution. In this case, the sampling distribution of x follows a f-distribution. 


DEFINITION 


If the distribution of a random variable x is approximately normal, then 
~*~ 
RY / Va 


follows a ¢-distribution. Critical values of t are denoted by ¢,. Here are several 
properties of the ¢-distribution. 


t 


1. The mean, median, and mode of the f-distribution are equal to 0. 
. The ¢-distribution is bell-shaped and symmetric about the mean. 
. The total area under the ¢-distribution curve is equal to 1. 


. The tails in the ¢distribution are “thicker” than those in the standard 
normal distribution. 


The standard deviation of the f-distribution varies with the sample size, but 
it is greater than 1. 


The f¢-distribution is a family of curves, each determined by a parameter 
called the degrees of freedom. The degrees of freedom (sometimes 


abbreviated as d.f.) are the number of free choices left after a sample 
statistic such as X is calculated. When you use a f-distribution to estimate 
a population mean, the degrees of freedom are equal to one less than the 
sample size. 


df.=n-1 Degrees of freedom 


As the degrees of freedom increase, the distribution approaches the 
standard normal distribution, as shown in the figure. For 30 or more 
degrees of freedom, the f-distribution is close to the standard normal 
distribution. 


Critical values in the 
t-distribution table for a 


specific confidence 
interval can be found in 
the column headed by c 


in the appropriate d.f. row. 


(The symbol a will be explained in 
Chapter 7) 


TI-84 PLUS 


invT?. Fra. 145 
22144756651 


SECTION 6.2 Confidence Intervals for the Mean (a Unknown) 333 


Table 5 in Appendix B lists critical values of t for selected confidence 
intervals and degrees of freedom. 


Finding Critical Values of ft 
Find the critical value t, for a 95% confidence level when the sample size is 15. 


SOLUTION 

Because n = 15, the degrees of freedom are df. =n —1=15-1=14.A 
portion of Table 5 is shown. Using df. = 14 and c = 0.95, you can find the 
critical value f,, as shown by the highlighted areas in the table. 


Level of 


confidence, c 0.80 0.90 0.98 0.99 


One tail, a 0.10 0.05 0.01 0.005 
df. Two tails, « 0.20 0.10 0.02 0.01 

1 3.078 6.314 31.821 63.657 
2 1.886 2.920 6.965 9.925 
3 1.638 2.353 4.541 5.841 
2.681 3.055 

2.650 3.012 

2.624 2.977 

2.602 2.947 

leis}  2Cn" 


From the table, you can see that t, = 2.145. The figure shows the f-distribution 
for 14 degrees of freedom, c = 0.95, and t, = 2.145. 


-t,= 2.145 t. = 2.145 


You can use technology to find t,. To use a TI-84 Plus, you need to know the 
area under the curve to the left of ¢,, which is 


0.95 + 0.025 = 0.975. Area to the left of f, 


From the TI-84 Plus display at the left, , ~ 2.145. 


Interpretation So, for a t-distribution curve with 14 degrees of freedom, 95% 
of the area under the curve lies between t = +2.145. 


TRY IT YOURSELF 1 


Find the critical value t, for a 90% confidence level when the sample size is 22. 
Answer: Page A36 


When the number of degrees of freedom you need is not in the table, use 
the closest number in the table that is /ess than the value you need (or use 
technology, as shown in Example 1). For instance, for d.f. = 57, use 50 degrees 
of freedom. This conservative approach will yield a larger confidence interval 
with a slightly higher level of confidence c. 


334 CHAPTER 6 Confidence Intervals 


Confidence Intervals and f-Distributions 


Constructing a confidence interval for ~ when o is not known using the 
t-distribution is similar to constructing a confidence interval for 42 when o is 
known using the standard normal distribution—both use a point estimate X and a 
margin of error E. When a is not known, the margin of error F is calculated using 
the sample standard deviation s and the critical value f,. So, the formula for E is 


By ‘ 
E= tla Margin of error for 4 (0 unknown) 
Before using this formula, verify that the sample is random, and either the 
ia population is normally distributed or n = 30. 
: GUIDELINES 

| Study Tip 

Remember that you can Constructing a Confidence Interval for a Population Mean (o Unknown) 

calculate the sample In Words In Symbols 

Standard deviation s using . Verify that o is not known, the sample 

the formula is random, and either the population 

(x — x)? is normally distributed or n = 30. 
s= ,/———_— 
n-1 . Find the sample statistics n, x, and s. 


or the alternate formula 
yx? — (Ex)2/n 
s= . 


n- 1 


. Identify the degrees of freedom, 
the level of confidence c, and the Use Table 5 in Appendix B. 
critical value f,. 


However, the most convenient way . . 5 
to find the sample standard deviation . Find the margin of error E. L= bc Ta 


ist technology. : ‘ i : 
en eee ea ea . Find the left and right endpoints Left endpoint: x — E 


and form the confidence interval. Right endpoint: ¥ + E 
Interval: ¥ -E<pw<x+E 


See Minitab steps 
on page 366. 


Constructing a Confidence Interval 


You randomly select 16 coffee shops and measure the temperature of the 
coffee sold at each. The sample mean temperature is 162.0°F with a sample 
standard deviation of 10.0°F. Construct a 95% confidence interval for the 
population mean temperature of coffee sold. Assume the temperatures are 
approximately normally distributed. 


SOLUTION Because o is unknown, the sample is random, and the 
temperatures are approximately normally distributed, use the t-distribution. 
Usingn = 16,x = 162.0,s = 10.0,c = 0.95, and d.f. = 15, you can use Table 5 
to find that ¢, = 2.131. The margin of error at the 95% confidence level is 


bape ois 53 
Vn V16 
156.7 1673 The confidence interval is shown below and in the figure at the left. 
\ — ( F: Left Endpoint Right Endpoint 
156 158 160 162 164 166 168 xX — E = 162 — 5.3 = 156.7 x+ E =~ 162 + 5.3 = 167.3 


156.7 < p < 167.3 


Interpretation With 95% confidence, you can say that the population mean 
temperature of coffee sold is between 156.7°F and 167.3°F. 


To explore this topic further, 
see Activity 6.2 on page 340. 


HISTORICAL REFERENCE 


William S. Gosset (1876-1937) 


Developed the ¢t-distribution 
while employed by the 
Guinness Brewing Company 
in Dublin, Ireland. Gosset 
published his findings using 
the pseudonym Student. The 
t-distribution is sometimes 
referred to as Student's 
t-distribution. (See page 57 for 
others who were important in 
the history of statistics.) 


SECTION 6.2 Confidence Intervals for the Mean (a Unknown) 335 


TRY IT YOURSELF 2 


Construct 90% and 99% confidence intervals for the population mean 
temperature of coffee sold in Example 2. 
Answer: Page A36 


See TI-84 Plus 
steps on page 367. 


Constructing a Confidence Interval 

You randomly select 36 cars of the same model that were sold at a car 
dealership and determine the number of days each car sat on the dealership’s 
lot before it was sold. The sample mean is 9.75 days, with a sample standard 
deviation of 2.39 days. Construct a 99% confidence interval for the population 
mean number of days the car model sits on the dealership’s lot. 


SOLUTION 

Because o is unknown, the sample is random, and n = 36 = 30, use the 
t-distribution. Using n = 36, X = 9.75, s = 2.39, c = 0.99, and df. = 35, 
you can use Table 5 to find that ¢, = 2.724. The margin of error at the 
99% confidence level is 

2.39 


V36 


The confidence interval is constructed as shown. 


AY 
E=t.—= 
n 


2.724 > = 1.09. 


Left Endpoint Right Endpoint 
x — E = 9.75 — 1.09 xX + E = 9.75 + 1.09 
= 8.66 = 10.84 
ge. 8.66 < uw < 10.84 A 
8.66 10.84 
\ 9.75 / 
8 8.5 9 9.5 10 10.5 ul 


You can check this answer using technology, as shown below. (When using 
technology, your answers may differ slightly from those found using Table 5.) 


One sample T confidence interval: 
yu: Mean of population 


99% confidence interval results: 


Mean Sample Mean’ Std. Err. DF _ L. Limit U. Limit 
u 9.75 0.39833333 35 8.6650174 10.834983 


Interpretation With 99% confidence, you can say that the population mean 
number of days the car model sits on the dealership’s lot is between 8.66 
and 10.84. 


TRY IT YOURSELF 3 


Construct 90% and 95% confidence intervals for the population mean number 
of days the car model sits on the dealership’s lot in Example 3. Compare the 
widths of the confidence intervals. Answer: Page A36 


336 CHAPTER 6 Confidence Intervals 


oN 
OQSN 
Megan 


woe) Picturing 
the World 


Two footballs, one filled with air 
and the other filled with helium, 
were kicked on a windless day 

at Ohio State University. The 
footballs were alternated with 
each kick. After 10 practice kicks, 
each football was kicked 29 more 
times. The distances (in yards) are 
listed. (Source: The Columbus Dispatch) 


Air Filled @ 
9 


00222 
555566 
77788888999 
1112 

34 Key: 1|9 = 19 


wWOwWwNnN NY NR 


Helium Filled 


12 
4 


2 

34666 
78889999 
00001122 
345 

9 Key: 1|1 = 11 


WW WN NN RF RF FR 


Assume that the distances are 
normally distributed for each 
football. Apply the flowchart at the 
right to each sample. Construct 

a 95% confidence interval for 
the population mean distance 
each football traveled. Do the 
confidence intervals overlap? 
What does this result tell you? 


The flowchart describes when to use the standard normal distribution 


and when to use the f¢-distribution to construct a confidence interval for a 
population mean. 


normal distribution or the 
t-distribution. 


distributed or is n = 30? 


& 


Is o known? 


G g 


Is the population normally No You cannot use the standard 


Use the standard normal Use the ¢-distribution with 
ee with E=t, Tr Sesion 6.2 
E=z-—: Section 6.1 i 
Jn and n — 1 degrees of freedom. 


Notice in the flowchart that when both n < 30 and the population is not 
normally distributed, you cannot use the standard normal distribution or the 
t-distribution. 


Choosing the Standard Normal Distribution or the #Distribution 


You randomly select 25 newly constructed houses. The sample mean 
construction cost is $181,000 and the population standard deviation is $28,000. 
Assuming construction costs are normally distributed, should you use the 
standard normal distribution, the f-distribution, or neither to construct a 95% 
confidence interval for the population mean construction cost? Explain your 
reasoning. 


SOLUTION 


Is the population normally distributed or isn = 30? 
Yes, the population is normally distributed. Note that even though 


n = 25 < 30 


you can still use either the standard normal distribution or the f-distribution 
because the population is normally distributed. 


Is o known? 
Yes. 


Decision: 
Use the standard normal distribution. 


TRY IT YOURSELF 4 


You randomly select 18 adult male athletes and measure the resting heart 
rate of each. The sample mean heart rate is 64 beats per minute, with a 
sample standard deviation of 2.5 beats per minute. Assuming the heart rates 
are normally distributed, should you use the standard normal distribution, 
the f-distribution, or neither to construct a 90% confidence interval for the 
population mean heart rate? Explain your reasoning. 

Answer: Page A36 


SECTION 6.2 Confidence Intervals for the Mean (@ Unknown) 337 


6.2 EXERCISES crete red St 


Building Basic Skills and Vocabulary 


Finding Critical Values of t § In Exercises 1—4, find the critical value t, for the 
level of confidence c and sample size n. 


1. c = 0.90, n = 10 2. c = 0.95,n = 12 
3. c = 0.99,n = 16 4. c = 0.98,n = 40 


In Exercises 5—8, find the margin of error for the values of c, s, and n. 
5. c= 0.95,s =5,n = 16 

6. c= 0.99,s =3,n =6 

7. c = 0.90, 5 = 2.4,n = 35 

8. c = 0.98, 5 = 4.7,n = 9 


In Exercises 9-12, construct the indicated confidence interval for the population 
mean p using the t-distribution. Assume the population is normally distributed. 
9. c = 0.90, x = 12.5, 5 = 2.0,n = 6 
10. c = 0.95, x = 13.4,5 = 0.85,n = 8 
IL. c = 0.98, x = 43,5 = 0.34,n = 14 
12. c = 0.99, ¥ = 24.7, 5 = 4.6,n = 50 
In Exercises 13-16, use the confidence interval to find the margin of error and the 
sample mean. 
13. (14.7, 22.1) 14. (6.17, 8.53) 
15. (64.6, 83.6) 16. (16.2, 29.8) 


Using and Interpreting Concepts 


Constructing a Confidence Interval Jn Exercises 17-20, you are given 
the sample mean and the sample standard deviation. Assume the population is 
normally distributed and use the t-distribution to find the margin of error and 
construct a 95% confidence interval for the population mean. Interpret the results. 


17. Commute Time In a random sample of eight people, the mean commute 
time to work was 35.5 minutes and the standard deviation was 7.2 minutes. 


18. Driving Distance In a random sample of five people, the mean driving 
distance to work was 22.2 miles and the standard deviation was 5.8 miles. 


19. Cell Phone Prices In a random sample of eight cell phones, the mean full 
retail price was $526.50 and the standard deviation was $184.00. 


20. Mobile Device Repair Costs In a random sample of 12 mobile devices, the 
mean repair cost was $90.42 and the standard deviation was $33.61. 


21. You research commute times to work and find that the population standard 
deviation is 9.3 minutes. Repeat Exercise 17 using the standard normal 
distribution with the appropriate calculations for a standard deviation that is 
known. Compare the results. 


338 


CHAPTER 6 Confidence Intervals 


22. You research driving distances to work and find that the population standard 
deviation is 5.2 miles. Repeat Exercise 18 using the standard normal 
distribution with the appropriate calculations for a standard deviation that is 
known. Compare the results. 


23. You research prices of cell phones and find that the population mean is 
$431.61. In Exercise 19, does the t-value fall between —fp 95 and f0,.95? 


24. You research repair costs of mobile devices and find that the population 
mean is $89.56. In Exercise 20, does the t-value fall between —fp95 and f.95? 


Constructing a Confidence Interval Jn Exercises 25-28, use the data set to 
(a) find the sample mean, (b) find the sample standard deviation, and (c) construct 
a 99% confidence interval for the population mean. Assume the population is 
normally distributed. 


25. SAT Scores The SAT scores of 12 randomly selected high school seniors 


1130 1290 1010 1320 950 1250 
1340 1100 1260 1180 1470 920 


26. Grade Point Averages The grade point averages of 14 randomly selected 
college students 


2.3 33 26 18 31 40 0.7 23 2.0 31 34 13 26 2.6 


27. College Football The weekly time (in hours) spent weight lifting for 
16 randomly selected college football players 


74 5.8 73 7.0 89 94 83 9.3 
69 75 90 5.8 55 86 93 3.8 


28. Homework The weekly time spent (in hours) on homework for 18 randomly 
selected high school students 


12.0 11.3 13.5 11.7 12.0 13.0 15.5 10.8 12.5 
12.3 140 95 88 10.00 12.8 15.0 11.8 13.0 


29. In Exercise 25, the population mean SAT score is 1020. Does the t-value fall 
between —fg 99 and to.99? (Source: The College Board) 


30. In Exercise 28, the population mean weekly time spent on homework by 
students is 7.8 hours. Does the ft-value fall between —fp 99 and fo 99? 


Constructing a Confidence Interval Jn Exercises 31 and 32, use the data set 
to (a) find the sample mean, (b) find the sample standard deviation, and (c) construct 
a 98% confidence interval for the population mean. 


BG 31. Earnings The monthly earnings (in Yens) of 32 randomly selected 
teachers in Japan (Adapted from Indeed.com) 


240,833 308,333 380,456 296,364 199,980 332,325 269,654 212,294 
236,245 222,301 362,321 254,269 154,196 401,039 354,967 241,178 
264,531 297,813 213,546 221,239 230,280 249,236 297,246 315,218 
349,516 299,271 301,629 251,000 296,362 241,528 303,085 311,520 


7% 32. Earnings The annual earnings (in thousand Yens) of 40 randomly 
selected editors in Japan (Adapted from Indeed.com) 


3,614 4,284 3,548 3,694 4182 5,142 4,568 3,954 
4.215 4,169 3,941 5,052 4,941 4,313 4,654 4,719 
3,845 3,964 4,418 3,874 3,674 3,224 5,248 5,920 
4,289 4,112 3,769 4,600 5,040 3,280 4,322 3,771 
4,758 5,920 3,691 5,178 2,120 3,040 2,940 2,949 


R 


SECTION 6.2 Confidence Intervals for the Mean (a Unknown) 339 


33. In Exercise 31, the population mean salary is 305,046 Yen. Does the t-value 
fall between —fp 9g and fo9g? (Source: Indeed.com) 


34. In Exercise 32, the population mean salary is 3,750 Yen. Does the t-value fall 
between —fo9g and tg9g? (Source: Indeed.com) 


Choosing a Distribution Jn Exercises 35-38, use the standard normal 
distribution or the t-distribution to construct a 95% confidence interval for the 
population mean. Justify your decision. If neither distribution can be used, explain 
why. Interpret the results. 


35. Body Mass Index Ina random sample of 50 people, the mean body mass 
index (BMI) was 27.7 and the standard deviation was 6.12. 


36. GDP Growth Rates In a random sample of 18 quarters from 2008 
through 2016, the mean GDP growth rate for Russia was 0.403 and the 
standard deviation was 1.306. Assume the growth rates are normally 
distributed. (Source: Ieconomics.com) 


eB 37. Gas Mileage The gas mileages (in miles per gallon) of 28 randomly 
selected sports cars are listed. Assume the mileages are not normally 
distributed. 


21 30 19 20 21 24 18 24 27 20 22 30 25 26 
22 17 21 24 22 20 24 21 20 18 20 21 20 27 


38. Deliveries per Fifty In a recent Indian Premier League season, the 
population standard deviation of the deliveries faced to score fifty runs for all 
batsmen was 6.37. The deliveries per fifty of 10 randomly selected batsmen 
are listed. Assume the deliveries per fifty are normally distributed. (Source: 
Indian Premiere League) 


15 19 20 22 23 25 29 32 36 49 


39. In Exercise 36, does it seem possible that the population mean could equal 
half the sample mean? Explain. 


40. In Exercise 38, does it seem possible that the population mean could be 
within 10% of the sample mean? Explain. 


Extending Concepts 


41. Tennis Ball Manufacturing A company manufactures tennis balls. When 
its tennis balls are dropped onto a concrete surface from a height of 
100 inches, the company wants the mean height the balls bounce upward to 
be 55.5 inches. This average is maintained by periodically testing random 
samples of 25 tennis balls. If the t-value falls between —fp 99 and f0.99, then 
the company will be satisfied that it is manufacturing acceptable tennis balls. 
For a random sample, the mean bounce height of the sample is 56.0 inches 
and the standard deviation is 0.25 inch. Assume the bounce heights are 
approximately normally distributed. Is the company making acceptable 
tennis balls? Explain. 


42. Light Bulb Manufacturing A company manufactures light bulbs. The 
company wants the bulbs to have a mean life span of 1000 hours. This 
average is maintained by periodically testing random samples of 16 light 
bulbs. If the f-value falls between —f9.99 and foo, then the company will 
be satisfied that it is manufacturing acceptable light bulbs. For a random 
sample, the mean life span of the sample is 1015 hours and the standard 
deviation is 25 hours. Assume the life spans are approximately normally 
distributed. Is the company making acceptable light bulbs? Explain. 


ACTIVITY 


Confidence Intervals for a Mean 


APPLET 


You can find the interactive 
applet for this activity 
within MyLab Statistics or at 
www.pearsonglobaleditions 
.com. 


APPLET 


The confidence intervals for a mean (the impact of not knowing the standard 
deviation) applet allows you to visually investigate confidence intervals for 
a population mean. You can specify the sample size n, the shape of the 
distribution (Normal or Right-skewed), the population mean (Mean), and the 
true population standard deviation (Std. Dev.). When you click SIMULATE, 
100 separate samples of size n will be selected from a population with these 
population parameters. For each of the 100 samples, a 95% Z confidence 
interval (known standard deviation) and a 95% T confidence interval (unknown 
standard deviation) are displayed in the plot at the right. The 95% Z confidence 
interval is displayed in green and the 95% T confidence interval is displayed in 
blue. When an interval does not contain the population mean, it is displayed in 
red. Additional simulations can be carried out by clicking SIMULATE multiple 
times. The cumulative number of times that each type of interval contains the 
population mean is also shown. Press CLEAR to clear existing results and start 
a new simulation. 


Step 1 Specify a value for n. ee 2 7 
Step 2 Specify a distribution. — — v 
onee a i vale fOr sees 10 

the mean. 
Step 4 Specify a value for the _Simutate | 


standard deviation. 
Step 5 Click SIMULATE 

to generate the 

confidence intervals. 


Cumulative results: 
95% ZCI 95% TCI 
Contained mean 


Did not contain mean 


Prop. contained 


Clear | 


DRAW CONCLUSIONS 


1. Set n = 30, Mean = 25, Std. Dev. = 5, and the distribution to Normal. 
Run the simulation so that at least 1000 confidence intervals are generated. 
Compare the proportion of the 95% Z confidence intervals and 95% 
T confidence intervals that contain the population mean. Is this what you 
would expect? Explain. 


2. In a random sample of 24 high school students, the mean number of hours 
of sleep per night during the school week was 7.26 hours and the standard 
deviation was 1.19 hours. Assume the sleep times are normally distributed. 
Run the simulation for m = 10 so that at least 500 confidence intervals are 
generated. What proportion of the 95% Z confidence intervals and 95% 
T confidence intervals contain the population mean? Should you use a 
Z confidence interval or a T confidence interval for the mean number of hours 
of sleep? Explain. 


340 CHAPTER 6 Confidence Intervals 


Marathon Training 


A marathon is a foot race with a distance of 26.22 miles. It was one of the original events of 
the modern Olympics, where it was a men’s only event. The women’s marathon became an 
Olympic event in 1984. The Olympic record for the men’s marathon was set during the 2008 
Olympics by Samuel Kamau Wanjiru of Kenya, with a time of 2 hours, 6 minutes, 32 seconds. 
The Olympic record for the women’s marathon was set during the 2012 Olympics by Tiki 
Gelana of Ethiopa, with a time of 2 hours, 23 minutes, 7 seconds. 

Training for a marathon typically lasts at least 6 months. The training is gradual, with 
increases in distance about every 2 weeks. About 1 to 3 weeks before the race, the distance 
run is decreased slightly. The stem-and-leaf plots below show the marathon training times 
(in minutes) for a random sample of 30 male runners and 30 female runners. 


Training Times (in minutes) 
of Male Runners 


15|}5 8 9 9 9 Key: 15|5 = 155 
16;0 000123 445 8 9 
17;0 113 56677 ~9 
18}0 1 5 

Training Times (in minutes) 

of Female Runners 

17/8 9 9 Key: 17|8 = 178 
18;};0 000123 4667 9 
19}0 0013 4 5 5 6 6 
20);0 0 1 2 3 


EXERCISES 


1. 


Use the sample to find a point estimate for the 
mean training time of the 


(a) male runners. 


(b) female runners. 


. Find the sample standard deviation of the 


training times for the 
(a) male runners. 


(b) female runners. 


. Use the sample to construct a 95% confidence 


interval for the population mean training time 
of the 


(a) male runners. 


(b) female runners. 


4. Interpret the results of Exercise 3. 


5. Use the sample to construct a 95% confidence 


interval for the population mean training time 
of all runners. How do your results differ from 
those in Exercise 3? Explain. 


. A trainer wants to estimate the population mean 


running times for both male and female runners 
within 2 minutes. Determine the minimum sample 
size required to construct a 99% confidence 
interval for the population mean training time of 


(a) male runners. Assume the population 
standard deviation is 8.9 minutes. 


(b) female runners. Assume the population 
standard deviation is 8.4 minutes. 


Case Study 341 


342 CHAPTER 6 Confidence Intervals 


63. Confidence Intervals for Population Proportions 


What You Should Learn Point Estimate fora Population Proportion a Confidence Intervals for a 
Population Proportion = Finding a Minimum Sample Size 


» How to find a point estimate for 
a population proportion 


Point Estimate for a Population Proportion 


~ How to construct and interpret 


confidence intervals for a Recall from Section 4.2 that the probability of success in a single trial of a binomial 

pefermMen seins si tejzie: ciel experiment is p. This probability is a population proportion. In this section, you 
~ How to determine the minimum will learn how to estimate a population proportion p using a confidence interval. 

sample size required when As with confidence intervals for 4, you will start with a point estimate. 

estimating a population 

ee DEFINITION 


The point estimate for p, the population proportion of successes, is given by 
the proportion of successes in a sample and is denoted by 


x 


p= a Sample proportion 


where x is the number of successes in the sample and n is the sample size. 
The point estimate for the population proportion of failures is @ = 1 — 
The symbols f and g are read as “p hat” and “q hat.” 


Finding a Point Estimate for p 


In a survey of 1550 U.S. adults, 1054 said that they use the social media website 
Facebook. Find a point estimate for the population proportion of U.S. adults 
who use Facebook. (Adapted from Pew Research Center) 


SOLUTION 
» Study Tip The number of successes is the number of adults who use Facebook, so 
In Sections 6.1 and 6.2, x = 1054. The sample size is n = 1550. So, the sample proportion is 
estimates were made x 
for quantitative data. In p= 7 Formula for sample proportion 
this section, sample 1054 
proportions are used to ae Substitute 1054 for x and 1550 for n. 
make estimates for 1550 
qualitative data. = 0.68 Divide. 
= 68%. Write as a percent. 


So, the point estimate for the population proportion of U.S. adults who use 
Facebook is 0.68 or 68%. 


TRY IT YOURSELF 1 


A poll surveyed 4780 U.S. adults How often do you Number 
about how often they shop online. The shop online? responding yes 
results are shown in the table. Find a 


: ‘ : At least once a week 717 
point estimate for the population ; 
proportion of U.S. adults who shop A few times a month oe 
online at least once a week. (Adapted | Less often 1769 
from Pew Research Center) Never 956 


Answer: Page A36 


SECTION 6.3 Confidence Intervals for Population Proportions 343 


LSD 
Bod 
owe 


eee) Picturing 
the World 


A poll surveyed 1519 U.S. adults 
about global climate change. Of 
those surveyed, 936 said that they 
expect to make major changes in 
their lives to address problems 
from climate change in the next 
50 years. (Adapted from Pew Research 
Center) 


Confidence Intervals for a Population Proportion 


Constructing a confidence interval for a population proportion p is similar to 
constructing a confidence interval for a population mean. You start with a point 
estimate and calculate a margin of error. 


DEFINITION 


A c-confidence interval for a population proportion p is 


p-E<p<pteE 


where 


In the Next 50 Years, Do You i eg 


Think You Will Make Major 
Changes to Your Way of Life 
in Order to Address Problems 
from Global Climate Change? 


The probability that the confidence interval contains p is c, assuming that the 
estimation process is repeated a large number of times. 


In Section 5.5, you learned that a binomial distribution can be approximated 
by a normal distribution when np = 5 and ng = 5. When np = 5 and ng = S, 
the sampling distribution of p is approximately normal with a mean of 


Lp = P Mean of the sample proportions 
and a standard error of 


Pd 


op = ry Standard error of the sample proportions 
Find a 90% confidence interval Noti ao  Vnapq  Vapq eo \a ) 
; : otice og = — = = = = ; 
for the population proportion of op n n Ve ne n 


people who expect to make major 
changes in their lives to address 
problems from climate change in 
the next 50 years. 


GUIDELINES 


Constructing a Confidence Interval for a Population Proportion 
In Words In Symbols 
. Identify the sample statistics and x. 


. Find the point estimate p. 


. Verify that the sampling distribution 
of p can be approximated by a 


Tech Tip normal distribution. 
Here are instructions . Find the critical value z, that Use Table 4 in Appendix B. 
for constructing a corresponds to the given level 
confidence interval for of confidence c. 
a population proportion bg 
on a TI-84 Plus. . Find the margin of error E. = Z he 
STAT ; le dee: 
. Find the left and right endpoints Left endpoint: 6 — E 
Choose the TESTS menu. and form the confidence interval. Right endpoint: 6 + E 
A: 1-PropZint . . . Interval: 


p-E<p<pteE 


Enter the values of x, n, and the 
level of confidence c (C-Level). 
Then select Calculate. In Step 4 above, note that the critical value z, is found the same way it was 


found in Section 6.1, by either using Table 4 in Appendix B or using technology. 


344 


CHAPTER 6 Confidence Intervals 


Study Tip 


Notice in Example 2 that 
the confidence interval for 


the population proportion p 
is rounded to three decimal 
places. This round-off rule 
will be used throughout 
the text. 


Minitab and TI-84 Plus steps are 
shown on pages 366 and 367. 


Constructing a Confidence Interval for p 


Use the data in Example 1 to construct a 95% confidence interval for the 
population proportion of U.S. adults who use Facebook. 


SOLUTION 


From Example 1, f = 0.68. So, the point estimate for the population 
proportion of failures is 


g = 1 — 0.68 = 0.32. 


Using n = 1550, you can verify that the sampling distribution of f can be 
approximated by a normal distribution. 


np = (1550) (0.68) = 1054 > 5 
and 
ng = (1550) (0.32) = 496 > 5 


Using z. = 1.96, the margin of error is 


Ba (0.68) (0.32) 
Bae! = 155 ~ 0,023. 
aca 2 1550 


Next, find the left and right endpoints and form the 95% confidence interval. 


Left Endpoint Right Endpoint 
p — E = 0.68 — 0.023 p+ E = 0.68 + 0.023 
= 0.657 = 0.703 
Wcce 0.657 < p < 0.703 aot 
0.657 0.68 0.703 


0.64 0.65 0.66 0.67 0.68 0.69 0.70 0.71 0.72 


You can check this answer using technology, as shown below. (When using 
technology, your answers may differ slightly from those found using Table 4.) 


95% confidence interval results: 


Proportion Count Total Sample Prop. Std. Err. L. Limit U. Limit 
p 1054 1550 0.68 0.011849 0.65678 0.70322 


Interpretation With 95% confidence, you can say that the population 
proportion of U.S. adults who use Facebook is between 65.7% and 70.3%. 
TRY IT YOURSELF 2 


Use the data in Try It Yourself 1 to construct a 90% confidence interval for 
the population proportion of U.S. adults who shop online at least once a week. 
Answer: Page A36 


The confidence level of 95% used in Example 2 is typical of opinion polls. 


The result, however, is usually not stated as a confidence interval. Instead, the 
result of Example 2 would be stated as shown. 


A survey found that 68% of U.S. adults use Facebook. 
The margin of error for the survey is 2.3%. 


To explore this topic further, 
see Activity 6.3 on page 351. 


TI-84 PLUS 


1-Fror2iInt 


(.22957,.310433 
éa 37 


h=S88 


SECTION 6.3 Confidence Intervals for Population Proportions 345 


Constructing a Confidence Interval for p 


The figure below is from a survey of 800 U.S. adults ages 18 to 29. Construct a 
99% confidence interval for the population proportion of 18- to 29-year-olds 
who get their news on television. (Adapted from Pew Research Center) 


Percent of 18- to 29-year-olds who get 
news on each platform — 


Online 08H, 


Radio | 14% 
Print newspapers i 5 % 


Television NN ML HNN 27% 


SOLUTION 
From the figure, p = 0.27. So, Gg = 1 — 0.27 = 0.73. Using n = 800, note that 


np = (800) (0.27) = 216 > 5 
and 
ng = (800) (0.73) = 584 > 5. 


So, the sampling distribution of p is approximately normal. Using z, = 2.575, 
the margin of error is 


E= Z% wd 
n 
(0.27) (0.73) Use Table 4 in Appendix B to estimate 
~ 2.575 800 that z, is halfway between 2.57 and 2.58. 
= 0.040. 
Next, find the left and right endpoints and form the 99% confidence interval. 
Left Endpoint Right Endpoint 
p — E = 0.27 — 0.040 p+ E =~ 0.27 + 0.040 
= 0.230 = 0.310 
ne 0.230 < p < 0.310 ae 
0.230 0.27 0,310 


0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.30 0.31 0.32 
You can check this answer using technology, as shown at the left. 


Interpretation With 99% confidence, you can say that the population 
proportion of 18- to 29-year-olds who get their news on television is between 
23.0% and 31.0%. 


TRY IT YOURSELF 3 


Use the data in Example 3 to construct a 99% confidence interval for the 
population proportion of 18- to 29-year-olds who get their news online. 
Answer: Page A36 


346 CHAPTER 6 Confidence Intervals 


Finding a Minimum Sample Size 


One way to increase the precision of a confidence interval without decreasing the 
level of confidence is to increase the sample size. 


Finding a Minimum Sample Size to Estimate p 


Study Tip 


The reason for using 
p=05andg=05 
when no preliminary 
estimate is available is 
that these values yield the 
= “™ maximum value of the 
product 6d. (See Exercise 37) In 
other words, without an estimate of 
p, you must pay the penalty of using 
a larger sample. 


Given a c-confidence level and a margin of error E, the minimum sample 
size n needed to estimate the population proportion p is 


If n is not a whole number, then round 7 up to the next whole number (see 
Example 4). Also, note that this formula assumes that you have preliminary 
estimates of f and q. If not, use p = 0.5 and Gg = 0.5. 


Determining a Minimum Sample Size 


You are running a political campaign and wish to estimate, with 95% confidence, 
the population proportion of registered voters who will vote for your candidate. 
Your estimate must be accurate within 3% of the population proportion. Find 
the minimum sample size needed when (1) no preliminary estimate is available 
and (2) a preliminary estimate gives p = 0.31. Compare your results. 


SOLUTION 


1. Because you do not have a preliminary estimate of p, use p = 0.5 and 
q = 0.5. Using z, = 1.96 and E = 0.03, you can solve for n. 


2 2 
n= pa( *) = (05)(05)(1") = 1067.11 


Because n is not a whole number, round up to the next whole number, 1068. 


2. You have a preliminary estimate of p = 0.31.S0,q = 0.69. Using z. = 1.96 
and E = 0.03, you can solve for n. 


ae 1.96 \2 
n= pa( =) = (0.31)(0.69)( 22) ~ 913.02 


Because 7 is not a whole number, round up to the next whole number, 914. 


Interpretation With no preliminary estimate, the minimum sample size 
should be at least 1068 registered voters. With a preliminary estimate of 
p = 0.31, the sample size should be at least 914 registered voters. So, you will 
need a larger sample size when no preliminary estimate is available. 


TRY IT YOURSELF 4 


A researcher is estimating the population proportion of people in the United 
States who delayed seeking medical care during the last 12 months due to 
costs. The estimate must be accurate within 2% of the population proportion 
with 90% confidence. Find the minimum sample size needed when (1) no 
preliminary estimate is available and (2) a previous survey found that 6.3% 
of people in the United States delayed seeking medical care during the last 12 
months due to costs. (Source: NCHS, National Health Interview Survey) 

Answer: Page A36 


SECTION 6.3 Confidence Intervals for Population Proportions 347 


6.3 [ X E A hk | NN [ iN For Extra Help: MyLab Statistics 


Building Basic Skills and Vocabulary 


True or False? = Jn Exercises 1 and 2, determine whether the statement is true or 
false. If it is false, rewrite it as a true statement. 


1. To estimate the value of p, the population proportion of successes, use the 
point estimate x. 


2. The point estimate for the population proportion of failures is 1 — p. 


Finding p and gq _ In Exercises 3-6, let p be the population proportion for the 
situation. Find point estimates of p and q. 


3. Tax Fraud In a survey of 1040 U.S. adults, 62 have had someone 
impersonate them to try to claim tax refunds. (Adapied from Pew Research 
Center) 


4. Investigating Crimes In a survey of 1040 U.S. adults, 478 believe the 
government should be able to access encrypted communications when 
investigating crimes. (Adapted from Pew Research Center) 


5. Mainstream Media Ina survey of 2016 U‘S. adults, 1310 think mainstream 
media is more interested in making money than in telling the truth. (Adapted 
from Ipsos Public Affairs) 


6. Terrorism Ina survey of 2016 U.S. adults, 665 believe America should stop 
terrorism at all costs. (Adapted from Ipsos Public Affairs) 


In Exercises 7-10, use the confidence interval to find the margin of error and the 


sample proportion. 
7. (0.905, 0.933) 8. (0.245, 0.475) 
9. (0.512, 0.596) 10. (0.087, 0.263) 


Using and Interpreting Concepts 


Constructing Confidence Intervals Jn Exercises 11 and 12, construct 90% 
and 95% confidence intervals for the population proportion. Interpret the results 
and compare the widths of the confidence intervals. 


11. New Year’s Resolutions In a survey of 2241 US. adults in a recent year, 
1322 say they have made a New Year’s resolution. (Adapted from The Harris 
Poll) 


12. New Year’s Resolutions In a survey of 2241 U.S. adults in a recent year, 
650 made a New Year’s resolution to eat healthier. (Adapted from The Harris 
Poll) 


Constructing Confidence Intervals Jn Exercises 13 and 14, construct a 
99% confidence interval for the population proportion. Interpret the results. 


13. Police Body Cameras In a survey of 1000 U.S. adults, 700 think police 
officers should be required to wear body cameras while on duty. (Adapied 
from Rasmussen Reports) 


14. Teacher Body Cameras In a survey of 600 United Kingdom teachers, 226 
say they would wear a body camera in school. (Adapted from Times Education 
Supplement) 


348 


CHAPTER 6 Confidence Intervals 


15. 


16. 


17. 


18. 


19. 


20. 


LGBT Identification Ina survey of 1,626,773 U.S. adults, 49,311 personally 
identify as lesbian, gay, bisexual, or transgender. Construct a 95% confidence 
interval for the population proportion of U.S. adults who personally identify 
as lesbian, gay, bisexual, or transgender. (Source: Gallup) 


Transgender Bathroom Policy In a survey of 1000 U.S. adults, 490 oppose 
allowing transgender students to use the bathrooms of the opposite biological 
sex. Construct a 90% confidence interval for the population proportion of 
U.S. adults who oppose allowing transgender students to use the bathrooms 
of the opposite biological sex. (Adapted from Rasmussen Reports) 


Congress You wish to estimate, with 95% confidence, the population 
proportion of U.S. adults who think Congress is doing a good or excellent 
job. Your estimate must be accurate within 4% of the population proportion. 


(a) No preliminary estimate is available. Find the minimum sample size 
needed. 


(b) Find the minimum sample size needed, using a prior survey that found 
that 25% of U.S. adults think Congress is doing a good or excellent 
job. (Source: Rasmussen Reports) 


(c) Compare the results from parts (a) and (b). 


Genetically Modified Organisms You wish to estimate, with 99% 
confidence, the population proportion of U.S. adults who support labeling 
legislation for genetically modified organisms (GMOs). Your estimate must 
be accurate within 2% of the population proportion. 


(a) No preliminary estimate is available. Find the minimum sample size 
needed. 


(b) Find the minimum sample size needed, using a prior survey that found 
that 75% of U.S. adults support labeling legislation for GMOs. (Source: 
The Harris Poll) 


(c) Compare the results from parts (a) and (b). 


Fast Food You wish to estimate, with 90% confidence, the population 
proportion of U.S. adults who eat fast food four to six times per week. Your 
estimate must be accurate within 3% of the population proportion. 


(a) No preliminary estimate is available. Find the minimum sample size 
needed. 


(b) Find the minimum sample size needed, using a prior study that found 
that 11% of US. adults eat fast food four to six times per week. (Source: 
Statista) 


(c) Compare the results from parts (a) and (b). 


Alcohol-Impaired Driving You wish to estimate, with 95% confidence, 

the population proportion of motor vehicle fatalities that were caused by 

alcohol-impaired driving. Your estimate must be accurate within 5% of the 

population proportion. 

(a) No preliminary estimate is available. Find the minimum sample size 
needed. 


(b) Find the minimum sample size needed, using a prior study that found 
that 31% of motor vehicle fatalities were caused by alcohol-impaired 
driving. (Source: WalletHub) 


(c) Compare the results from parts (a) and (b). 


SECTION 6.3 Confidence Intervals for Population Proportions 349 


21. In Exercise 11, does it seem possible that the population proportion could 
equal 0.59? Explain. 


22. In Exercise 14, does it seem possible that the population proportion could be 
within 1% of the point estimate? Explain. 


23. In Exercise 17(b), would a sample size of 200 be acceptable? Explain. 
24. In Exercise 20(b), would a sample size of 600 be acceptable? Explain. 


Constructing Confidence Intervals In Exercises 25 and 26, use the figure, 
which shows the results of a survey in which 1003 adults from the United States, 
1020 adults from Canada, 999 adults from France, 1000 adults from Japan, and 
1000 adults from Australia were asked whether national identity is strongly tied to 
birthplace. (Source: Pew Research Center) 


Australia 
=< 


25. National Identity Construct a 99% confidence interval for the population 
proportion of adults who say national identity is strongly tied to birthplace 
for each country listed. 


26. In Exercise 25, does it seem possible that any of the population proportions 
could be equal? Explain. 


Constructing Confidence Intervals In Exercises 27 and 28, use the figure, 
which shows the results of a survey in which 2000 U.S. college graduates from the 
year 2016 were asked questions about employment. (Source: Accenture) 


27. Employment Construct (a) a 95% confidence interval and (b) a 99% 
confidence interval for the population proportion of college students who 
gave each response. 


28. In Exercise 27, does it seem possible that any of the population proportions 
could be equal? Explain. 


350 CHAPTER 6 Confidence Intervals 


Extending Concepts 


Translating Statements In Exercises 29-34, translate the statement into a 
confidence interval. Approximate the level of confidence. 


29. 


30. 


31. 


32. 


33. 


34. 


35. 
36. 


37. 


In a survey of 1003 U.S. adults, 70% said being able to speak English is at 
the core of national identity. The survey’s margin of error is 3.4%. (Source: 
Pew Research Center) 


In a survey of 1503 U.S. adults, 79% say people have the right to nonviolent 
protest. The survey’s margin of error is £2.9%. (Source: Pew Research Center) 


In a survey of 1000 U.S. adults, 71% think teaching is one of the most 
important jobs in our country today. The survey’s margin of error is +3%. 
(Source: Rasmussen Reports) 


Ina survey of 1035 U.S. adults, 37% say the U.S. spends too little on defense. 
The survey’s margin of error is 4%. (Source: Gallup) 


In a survey of 3539 U.S. adults, 47% believe the economy is getting better. 
Three weeks prior to this survey, 53% believed the economy was getting 
better. The survey’s margin of error is 2%. (Source: Gallup) 


In a survey of 1052 parents of children ages 8-14, 68% say they are willing 
to get a second or part-time job to pay for their children’s college education, 
and 42% say they lose sleep worrying about college costs. The survey’s 
margin of error is 3%. (Source: T. Rowe Price Group, Inc.) 


Why Check It? Why is it necessary to check that np = 5 and ng = 5? 


Sample Size The equation for determining the sample size 


of ee 
n= pq\ = 


can be obtained by solving the equation for the margin of error 


AA 


bg 


E=z ;: 


for n. Show that this is true and justify each step. 


Maximum Value of pq Complete the tables for different values of p and 
qG = 1 — p. From the tables, which value of / appears to give the maximum 
value of the product pq? 


Pd = EP rg Bb |G=1-)p| p@ 
0.0 1.0 0.00 0.45 
0.1 0.9 0.09 0.46 
0.2 0.8 0.47 
0.3 0.48 
0.4 0.49 
0.5 0.50 
0.6 0.51 
0.7 0.52 
0.8 0.53 
0.9 0.54 


ACTIVITY 


APPLET 


You can find the interactive 
applet for this activity 
within MyLab Statistics or at 
www.pearsonglobaleditions 
.com. 


Confidence Intervals for a Proportion 


The confidence intervals for a proportion applet allows you to visually 
investigate confidence intervals for a population proportion. You can specify 
the sample size and the population proportion p. When youclick SMULATE, 
100 separate samples of size n will be selected from a population with a 
proportion of successes equal to p. For each of the 100 samples, a 95% 
confidence interval (in green) and a 99% confidence interval (in blue) are 
displayed in the plot at the right. Each of these intervals is computed using 
the standard normal approximation. When an interval does not contain the 
population proportion, it is displayed in red. Note that the 99% confidence 
interval is always wider than the 95% confidence interval. Additional 
simulations can be carried out by clicking SIMULATE multiple times. The 
cumulative number of times that each type of interval contains the population 
proportion is also shown. Press CLEAR to clear existing results and start a 
new simulation. 


n:|100 
p:|0.5 


Simulate | 


Cumulative results: 


95% CI 99% CI 
Contained p 


Did not contain p 


Prop. contained 


Clear | 


Step 1 Specify a value for n. 
Step 2 Specify a value for p. 
Step 3 Click SIMULATE to generate the confidence intervals. 


DRAW CONCLUSIONS 


[ => 1. Run the simulation for p = 0.6 and n = 10, 20, 40, and 100. Clear the 


APPLET 


results after each trial. What proportion of the confidence intervals for 
each confidence level contains the population proportion? What happens 
to the proportion of confidence intervals that contains the population 
proportion for each confidence level as the sample size increases? 


2. Run the simulation for p = 0.4 and n = 100 so that at least 1000 
confidence intervals are generated. Compare the proportion of confidence 
intervals that contains the population proportion for each confidence level. 
Is this what you would expect? Explain. 


SECTION 6.3 Confidence Intervals for Population Proportions 351 


352 


What You Should Learn 


» How to interpret the chi-square 
distribution and use a 
chi-square distribution table 


~ How to construct and interpret 
confidence intervals for a 
population variance and 
standard deviation 


Study Tip 


The Greek letter y is 
pronounced “ki,” which 
rhymes with the more familiar 
Greek letter 7. 


CHAPTER 6 Confidence Intervals 


6A Confidence Intervals for Variance and Standard Deviation 


The Chi-Square Distribution = Confidence Intervals for 0? and 


The Chi-Square Distribution 


In manufacturing, it is necessary to control the amount that a process varies. For 
instance, an automobile part manufacturer must produce thousands of parts to 
be used in the manufacturing process. It is important that the parts vary little 
or not at all. How can you measure, and consequently control, the amount of 
variation in the parts? You can start with a point estimate. 


DEFINITION 


The point estimate for a? is s* and the point estimate for o is s. The most 


unbiased estimate for a7 is s*. 


You can use a chi-square distribution to construct a confidence interval for 


the variance and standard deviation. 


DEFINITION 


If a random variable x has a normal distribution, then the distribution of 
(n — 1)s? 


a 


forms a chi-square distribution for samples of any size n > 1. Here are 
several properties of the chi-square distribution. 


1. 
2. 


All values of y” are greater than or equal to 0. 


The chi-square distribution is a family of curves, each determined by 
the degrees of freedom. To form a confidence interval for a”, use the 
chi-square distribution with degrees of freedom equal to one less than the 
sample size. 


df.=n-1 


Degrees of freedom 


. The total area under each chi-square distribution curve is equal to 1. 
. The chi-square distribution is positively skewed and therefore the 


distribution is not symmetric. 


. The chi-square distribution is different for each number of degrees of 


freedom, as shown in the figure. As the degrees of freedom increase, the 
chi-square distribution approaches a normal distribution. 


Chi-Square Distribution for Different Degrees of Freedom 


SECTION 6.4 Confidence Intervals for Variance and Standard Deviation 353 


There are two critical values for eack level of confidence. The value Xp 
represents the right-tail critical value and Xp pepresenls the left-tail critical value. 
Table 6 in Appendix B lists critical values of x” for various degrees of freedom 
and areas. Each area listed in the top row of the table represents the region under 
the chi-square curve to the right of the critical value. 


Study Tip 
For chi-square critical 
values with a c-confidence 
level, the values shown 
below, x? and yZ are what 
you look up in Table 6 in 
Appendix B. 


Finding Critical Values for 2 


Find the critical values Xe and G for a 95% confidence interval when the 
sample size is 18. 


SOLUTION 
Because the sample size is 18, 
df.=n-1=18-1=17. Degrees of freedom 
The area to the right of x7, is 
1- 1 — 0.95 
Area to the right of x, = 5 c= 5 = 0.025 


and the area to the right of Xr is 


Area to the right of y> = soe 0.975. 
2 2 
A portion of Table 6 is shown. Using d.f. = 17 and the areas 0.975 and 0.025, 
you can find the critical values, as shown by the highlighted areas in the table. 
(Note that the top row in the table lists areas to the right of the critical value. 
The entries in the table are critical values.) 


Area to the right of i Area to the right of es 
Degrees of ] om 7) 


freedom | 0.995 0.99 0.95 0.90 0.10 0.05 

0.004 0.016 2.706 3.841 
0.103 0.211 4.605 5.991 
0.352 0.584 6.251 7815 


Area to the right of y? 


The result is that you can conclude 
that the area between the left and 
right critical values is c. 


2 0.010 0.020 
3 0.072 0.115 


15  |4.601 5.229 7261 8.547 22.307 24.996 
16 |5.142 5.812 7962 9.312 23.542 26.296 
as as 
18 28.869 31.526 


11.651 27.204 30.144 32.852 
12.443 28.412 31.410 34.170 


19 6.844 7633 8.907 
20 7.434 8.260 9.591 


From the table, you can see that the critical values are 


Xe = 30.191 and yy = 7.564. 


Interpretation So, for a chi-square distribution curve with 17 degrees of 
freedom, 95% of the area under the curve lies between 7.564 and 30.191, as 
shown in the figure at the left. 


TRY IT YOURSELF 1 


Find the critical values Xe and Xr for a 90% confidence interval when the 
; sample size is 30. 
X= 7564 Xp = 30.191 Answer: Page A36 


354 CHAPTER 6 Confidence Intervals 


[LN 
RS 


eee) Picturing 
the World 


The Florida panther is one of the 
most endangered mammals on 
Earth. In the southeastern United 
States, the only breeding population 
(with an estimated population of 
about 100 to 180 panthers) can 

be found on the southern tip of 
Florida. Most of the panthers live 
in (1) the Big Cypress National 
Preserve, (2) Everglades National 
Park, and (3) the Florida Panther 
National Wildlife Refuge, as shown 
on the map. In a study of 7 female 
panthers, it was found that the 
mean litter size was 2.14 kittens, 
with a standard deviation of 0.69. 
(Source: Florida Fish and Wildlife 
Conservation Commission) 


Construct a 90% confidence 
interval for the standard deviation 
of the litter size for female Florida 
panthers. Assume the litter sizes 
are normally distributed. 


Confidence Intervals for a2 and « 


You can use the critical values Xp and Xr to construct confidence intervals for 
a population variance and standard deviation. The best point estimate for the 
variance is s* and the best point estimate for the standard deviation is s. Because 
the chi-square distribution is not symmetric, the confidence interval for 0” cannot 
be written as s? + E. You must do separate calculations for the endpoints of the 
confidence interval, as shown in the next definition. 


DEFINITION 


The c-confidence intervals for the population variance and standard deviation 
are shown. 


Confidence Interval for 0: 


Confidence Interval for o: 


(n — 1)s? (n — 1)s? 
a ee | eae oe 
Xr XL 
The probability that the confidence intervals contain o~ or o is c, assuming 
that the estimation process is repeated a large number of times. 


2 


GUIDELINES 


Constructing a Confidence Interval for a Variance and Standard Deviation 
In Words In Symbols 
. Verify that the population 
has a normal distribution. 
. Identify the sample statistic n 
and the degrees of freedom. 


. Find the point estimate s’. 1 

Hie 

. Find the critical values Xe Use Table 6 in Appendix B. 
and Xr that correspond 


to the given level of 
confidence c and the 
degrees of freedom. 


. Find the left and right Left Endpoint Right Endpoint 


endpoints and form (n — 1)? 

the confidence interval <0’ < 
: 2 2 

for the population XR XX, 


variance. 


. Find the confidence Left Endpoint Right Endpoint 
interval for the population 7 7 
standard deviation by came <o< ees 
taking the square XR XL 
root of each endpoint. 


(n — 1)s? 


Study Tip 


When you construct a 
confidence interval 
for a population variance or standard 
deviation, the general rouna-off rule 
is to round off to the same number 
of decimal places as the sample 
variance or standard deviation. 


SECTION 6.4 Confidence Intervals for Variance and Standard Deviation 355 


Constructing Confidence Intervals 

You randomly select and weigh 30 samples of an allergy medicine. The sample 
standard deviation is 1.20 milligrams. Assuming the weights are normally 
distributed, construct 99% confidence intervals for the population variance 
and standard deviation. 

SOLUTION 


The area to the right of x7, is 


1 
Area to the right of x, = 


— = 0.005 
2 2 
and the area to the right of ve is 
1+ 1 + 0.99 
Area to the right of yy = 5 c= 5 = 0.995. 


Using the values n = 30, d.f. = 29, and c = 0.99, the critical values Xe and a are 
Xp = 52.336 and Xr = 13.121. 


Using these critical values and s = 1.20, the confidence interval for 0? is 


Left Endpoint Right Endpoint 
(n—1)s? (30 — 1) (1.20)? (n—1)s? (30 — 1) (1.20)? 
~~ ~-~:52.336 ~ 13121 

= 0.80 = 3.18 

Pa 0.80 < a? < 3.18. a 
The confidence interval for @ is 
Left Endpoint Right Endpoint 
(30 — 1)(1.20)2 (30 — 1) (1.20) 

" 2336 °° | 13.121 


0.89 <o < 1.78. 


You can check your answer using technology, as shown below using Minitab. 


Test and Cl for One Variance 
999% Confidence Intervals 


Method 
Chi-Square 


Cl for StDev 
(0.89, 1.78) 


Cl for Variance 
(0.80, 3.18) 


Interpretation With 99% confidence, you can say that the population 
variance is between 0.80 and 3.18, and the population standard deviation is 
between 0.89 and 1.78 milligrams. 


TRY IT YOURSELF 2 


Construct the 90% and 95% confidence intervals for the population variance 
and standard deviation of the medicine weights. Answer: Page A36 


Note in Example 2 that the confidence interval for the population standard 
deviation cannot be written as s + E because the confidence interval does not 
have s as its center. (The same is true for the population variance.) 


356 CHAPTER 6 Confidence Intervals 


6.4 EXERCISES 


Final exam scores 


TABLE FOR EXERCISE 16 


For Extra Help: MyLab Statistics 


Building Basic Skills and Vocabulary 


1. Does a population have to be normally distributed in order to use the 
chi-square distribution? 


2. What happens to the shape of the chi-square distribution as the degrees of 
freedom increase? 


Finding Critical Values for y? In Exercises 3-8, find the critical values Xe 
and Xp for the level of confidence c and sample size n. 


3. c = 0.90,n = 8 4. c = 0.99,n = 15 
5. c = 0.95,n = 20 6. c = 0.98,n = 26 
7. c = 0.99,n = 30 8. c = 0.80,n = 51 


In Exercises 9-12, construct the indicated confidence intervals for (a) the 
population variance a? and (b) the population standard deviation o. Assume the 
sample is from a normally distributed population. 


9. c = 0.95, s* = 11.56,n = 30 10. c = 0.99, 5* = 0.64,n =7 
11. c = 0.90, s = 35,n = 18 12. c = 0.98, 5 = 278.1,n = 41 


Using and Interpreting Concepts 


Constructing Confidence Intervals Jn Exercises 13-24, assume the 
sample is from a normally distributed population and construct the indicated 
confidence intervals for (a) the population variance a? and (b) the population 
standard deviation a. Interpret the results. 


13. Bottles The heights (in centimeters) of 18 randomly selected bottles produced 
by a machine are listed. Use a 90% level of confidence. 


19.861 18.462 18591 18.684 19.191 19.985 19.549 
19.631 18.909 19.101 19.845 19.863 18.645 19.111 
18.999 18.959 19.769 19.771 


14. Vitamin D Tablets The quantity (in thousand IUs) of Vitamin D in 15 
randomly selected supplement tablets are listed. Use a 95% level of confidence. 


5.256 5.218 5.236 4.813 4.998 5.011 5.861 
4121 4343 5.863 5.791 5.011 4.985 4.862 
5.682 


7 15. Earnings The annual earnings (in thousands of dollars) of 21 randoml 
8 & y 
selected clinical pharmacists are listed. Use a 99% level of confidence. 
(Adapted from Salary.com) 


91.8 90.6 101.5 119.2 110.5 117.0 138.6 
112.1 136.6 123.6 111.4 80.5 105.7 99.9 
138.3 1136 814 894 948 146.6 106.6 


Be 16. Final Exam Scores The final exam scores of 24 randomly selected 
students in a statistics class are shown in the table at the left. Use a 95% 
level of confidence. 


17. Space Shuttle Flights The durations (in days) of 14 randomly selected space 
shuttle flights have a sample standard deviation of 3.54 days. Use a 99% level 
of confidence. (Source: NASA) 


Water quality survey 


n=19 i 
s = 15 grains/gallon eZ 


FIGURE FOR EXERCISE 19 


How much will you 


pay for your site? 


= AO) 
s = $3600 


— I 


_ — 


FIGURE FOR EXERCISE 20 


18. 


19. 


20. 


21. 


22. 


23. 


24. 


SECTION 6.4 Confidence Intervals for Variance and Standard Deviation 357 


College Football The numbers of touchdowns scored by 11 randomly 
selected NCAA Division I Subdivision teams in a recent season have a 
sample standard deviation of 9.35. Use an 80% level of confidence. (Source: 
National Collegiate Athletic Association) 


Water Quality As part of a water quality survey, you test the water 
hardness in several randomly selected streams. The results are shown in the 
figure at the left. Use a 95% level of confidence. 


Website Costs As part of a survey, you ask a random sample of business 
owners how much they would be willing to pay for a website for their 
company. The results are shown in the figure at the left. Use a 90% level of 
confidence. 


Inverter Batteries The reserve capacities (in hours) of 18 randomly 
selected inverter batteries have a sample standard deviation of 0.40 hour. 
Use a 90% level of confidence. 


Maximum Daily Temperature The record high daily temperatures 
(in degrees Fahrenheit) of a random sample of 64 days of the year in Grand 
Junction, Colorado, have a sample standard deviation of 16.8°F. Use a 98% 
level of confidence. (Source: NOAA) 


Waiting Times The waiting times (in minutes) of a random sample of 26 
people at a book-signing event have a sample standard deviation of 4.9 
minutes. Use a 99% level of confidence. 


Smartphones The prices of a random sample of 23 new smartphones have 
a sample standard deviation of $1,200. Use an 80% level of confidence. 


Extending Concepts 


25. 


26. 


27. 


28. 


29. 


Bottle Heights You are analyzing the sample of bottles in Exercise 13. The 
population standard deviation of the bottles’ heights should be less than 
0.35 centimeter. Does the confidence interval you constructed for a suggest 
that the variation in the bottles’ heights is at an acceptable level? Explain 
your reasoning. 


Vitamin D Amounts You are analyzing the sample of Vitamin D Tablets 
in Exercise 14. The population standard deviation of the amount of Vitamin 
D in the tablets should be less than 0.50 thousand IUs. Does the confidence 
interval you constructed for o suggest that the variation in the amounts of 
Vitamin D in the tablets is at an acceptable level? Explain your reasoning. 


Battery Reserve Capacities You are analyzing the sample of inverter 
batteries in Exercise 21. The population standard deviation of the batteries’ 
reserve capacities should be less than 0.50 hour. Does the confidence interval 
you constructed for o suggest that the variation in the batteries’ reserve 
capacities is at an acceptable level? Explain your reasoning. 


Waiting Times You are analyzing the sample of waiting times in Exercise 23. 
The population standard deviation of the waiting times should be less than 
7.9 minutes. Does the confidence interval you constructed for o suggest that 
the variation in the waiting times is at an acceptable level? Explain your 
reasoning. 


In your own words, explain how finding a confidence interval for a 
population variance is different from finding a confidence interval for a 
population mean or proportion. 


AND | Statistics in the Real World 


Registered voters 


Uses 


By now, you know that complete information about population parameters is 
often not available. The techniques of this chapter can be used to make interval 
estimates of these parameters so that you can make informed decisions. 

From what you learned in this chapter, you know that point estimates 
(sample statistics) of population parameters are usually close but rarely equal to 
the actual values of the parameters they are estimating. Remembering this can 
help you make good decisions in your career and in everyday life. For instance, 
the results of a survey tell you that 52% of registered voters plan to vote in favor 
of the rezoning of a portion of a town from residential to commercial use. You 
know that this is only a point estimate of the actual proportion that will vote 
in favor of rezoning. If the margin of error is 3%, then the interval estimate is 
0.49 < p < 0.55 and it is possible that the item will not receive a majority vote. 


Abuses 


Unrepresentative Samples There are many ways that surveys can result in 
incorrect predictions. When you read the results of a survey, remember to 
question the sample size, the sampling technique, and the questions asked. For 
instance, you want to know the proportion of people who will vote in favor of 
rezoning. From the diagram at the left, you can see that even when your sample 
is large enough, it may not consist of people who are likely to vote. 


Biased Survey Questions In surveys, it is also important to analyze the wording 
of the questions. For instance, the question about rezoning might be presented 
as: “Knowing that rezoning will result in more businesses contributing to school 
taxes, would you support the rezoning?” 


Misinterpreted Polls Some political pundits and voters vowed never to trust 
polls again after they failed to predict Donald Trump’s win over Hillary Clinton 
in the 2016 U.S. presidential election. However, nationwide polls the week of the 
election were only off by about 1% —the polls showed Clinton ahead by about 
3% and she ended up ahead in votes by about 2%. 

Many state polls were inaccurate, most of them in the same direction, with 
Trump receiving up to 10% more of the vote than expected in some states. 
This was enough to give him the majority of electoral votes and the presidency. 
Analysts are still debating the reasons so many state polls were unrepresentative 
of the people who actually voted. 


EXERCISES 


1. Unrepresentative Samples Find an example of a survey that is reported in a 
newspaper, in a magazine, or on a website. Describe different ways that the 
sample could have been unrepresentative of the population. 


2. Biased Survey Questions Find an example of a survey that is reported in a 
newspaper, in a magazine, or on a website. Describe different ways that the 
survey questions could have been biased. 


3. Misinterpreted Polls Determine whether each state election poll below was 
misleading. Assume the margin of error is 4% for each poll. 
(a) Michigan poll leader: Clinton by 3.4%; Election winner: Trump by 0.3% 
(b) Wisconsin poll leader: Clinton by 6.5%; Election winner: Trump by 0.7% 


358 CHAPTER 6 Confidence Intervals 


Chapter Summary 359 


6 Chapter Summary 


Review 
What Did You Learn? Example(s) Exercises 
Section 6.1 
» How to find a point estimate and a margin of error 1,2 12 
Oo 
E=z,— = Margin of error 
°/n g 
» How to construct and interpret confidence intervals for a population mean 3-5 3-6 
when oa is known 
X-E<w<xt+E 
» How to determine the minimum sample size required when estimating a 6 7,8 
population mean 
Section 6.2 
» How to interpret the t-distribution and use a t-distribution table 1 9-12 
xX — wb 
t= , af.=n-1 
s/Vn 
» How to construct and interpret confidence intervals for a population mean 2-4 13-18 
when ga is not known 
s 
X-E< <X+E6, E=t- = 
- “Vn 
Section 6.3 
» How to find a point estimate for a population proportion 1 19-24 
neas 
P n 
» How to construct and interpret confidence intervals for a population 2,3 19-24 
proportion 
p-E<p<p+t+E, E=z, eo 
» How to determine the minimum sample size required when estimating a 4 25, 26 
population proportion 
Section 6.4 
» How to interpret the chi-square distribution and use a chi-square distribution 1 27-30 
table 
n-— 1)s? 
arias = da 1 
Oo 
» How to construct and interpret confidence intervals for a population variance 2 31, 32 
and standard deviation 
n—1)s? n—1)s? n-1)s n-1)s 
(n= 1), (n= 1) (n=)? (n= 1) 


xe xP xe x 


360 CHAPTER 6 Confidence Intervals 


Systolic blood pressures 


(in mmHg) 


125 80 118 130 95 
108 96 134 92 103 
93 104 140 124 131 
98 123 97 132 145 
155 162 87 99 154 
155 123 129 96 122 


TABLE FOR EXERCISE 1 


6 Review Exercises 


Section 6.1 


eG 1. The systolic blood pressures (in mmHg) of 40 persons are shown in 
the table. Assume the population standard deviation is 25 mmHg. Find 
(a) the point estimate of the population mean u and (b) the margin of 
error for a 90% confidence interval. 


Be 2. The ages (in completed years) of 30 persons attending a course are 
shown below. Assume the population standard deviation is 10 years. 
Find (a) the point estimate of the population mean y and (b) the margin 
of error for a 95% confidence interval. 


Ages (in completed years) 


15 16 16 14 21 22 19 29 30 14 34 32 43 52 32 
12 17 18 15 16 23 20 27 10 23 33 29 36 24 16 


3. Construct a 90% confidence interval for the population mean in Exercise 1. 
Interpret the results. 


4. Construct a 95% confidence interval for the population mean in Exercise 2. 
Interpret the results. 


In Exercises 5 and 6, use the confidence interval to find the margin of error and 
the sample mean. 


5. (30.25, 42.50) 6. (7.428, 7.562) 


7. Determine the minimum sample size required to be 95% confident that the 
sample mean systolic blood pressure is within 8 mmHg of the population mean 
systolic blood pressure. Use the population standard deviation from Exercise 1. 


8. Determine the minimum sample size required to be 99% confident that the 
sample mean age is within 3 years of the population mean age. Use the population 
standard deviation from Exercise 2. 


Section 6.2 


In Exercises 9-12, find the critical value t, for the level of confidence c and sample 
size n. 


9. c = 0.90,n = 12 10. c = 0.95,n = 24 
11. c = 0.80,n = 16 12. c = 0.99, n = 30 
In Exercises 13-16, (a) find the margin of error for the values of c, s, and n, and 


(b) construct the confidence interval for ww using the t-distribution. Assume the 
population is normally distributed. 


13. c = 0.80, s = 27.4,n = 36,x = 81.6 
14. c = 0.95,5 = 1.1,n = 25,x = 3.5 

15. c = 0.90, s = 3.6,n = 20, x = 20.6 
16. c = 0.99, 5 = 16.5,n = 20, x = 25.2 


17. In a random sample of 36 top-rated roller coasters, the average height is 
165 feet and the standard deviation is 67 feet. Construct a 90% confidence 
interval for w. Interpret the results. (Source: POP World Media, LLC) 


18. You research the heights of top-rated roller coasters and find that the 
population mean is 160 feet. In Exercise 17, does the tvalue fall between 
—fo95 and f0 95? 


Review Exercises 361 


Section 6.3 


In Exercises 19-22, let p be the population proportion for the situation. (a) Find 
point estimates of p and q, (b) construct 90% and 95% confidence intervals for p, 
and (c) interpret the results of part (b) and compare the widths of the confidence 
intervals. 


19. In a survey of 1035 U.S. adults, 745 say they want the U.S. to play a leading 
or major role in global affairs. (Adapted from Gallup) 

20. Ina survey of 1003 U.S. adults, 451 believe that for a person to be considered 
truly American, it is very important that he or she share American customs 
and traditions. (Adapted from Pew Research Center) 


21. In a survey of 2202 U'S. adults, 1167 think antibiotics are effective against 
viral infections. (Adapted from The Harris Poll) 


22. In a survey of 2223 U.S. adults, 1334 say an occupation as an athlete is 
prestigious. (Adapted from The Harris Poll) 


23. In Exercise 19, does it seem possible that the population proportion could 
equal 0.75? Explain. 
24. In Exercise 22, does it seem possible that the population proportion could be 
within 1% of the point estimate? Explain. 
25. You wish to estimate, with 95% confidence, the population proportion of 
U.S. adults who have taken or planned to take a winter vacation in a recent 
year. Your estimate must be accurate within 5% of the population proportion. 
(a) No preliminary estimate is available. Find the minimum sample size 
needed. 

(b) Find the minimum sample size needed, using a prior study that found 
that 32% of U.S. adults have taken or planned to take a winter vacation 
in a recent year. (Source: Rasmussen Reports) 


(c) Compare the results from parts (a) and (b). 


26. In Exercise 25(b), would a sample size of 369 be acceptable? Explain. 


Section 6.4 


In Exercises 27-30, find the critical values Xp and Xr for the level of confidence c 
and sample size n. 


27. c = 0.90, n = 15 28. c = 0.98, n = 25 
29. c = 0.95,n = 20 30. c = 0.99,n = 10 


In Exercises 31 and 32, assume the sample is from a normally distributed 
population and construct the indicated confidence intervals for (a) the population 
variance o7 and (b) the population standard deviation o. Interpret the results. 


31. The maximum wind speeds (in knots) of 13 randomly selected hurricanes 
that have hit the U.S. mainland are listed. Use a 95% level of confidence. 
(Source: National Oceanic & Atmospheric Administration) 


70 85 70 75 100 100 110 105 130 75 85 75 70 


BG 32. The acceleration times (in seconds) from 0 to 60 miles per hour for 
33 randomly selected sedans are listed. Use a 98% level of confidence. 
(Source: Zero to 60 Times) 


65 50 52 33 66 63 5.1 53 54 95 7.5 


45 58 86 69 81 60 67 7.9 88 7.1 7.9 
7.2 184 91 68 125 42 71 99 95 28 4.9 


362 


CHAPTER 6 Confidence Intervals 


Women’s Open Division 
winning times (in hours) 


3.36 
2.79 
2.57 
2.42 
2.40 
2.42 


TABLE FOR EXERCISE 1 


3.45 
2.75 
2.42 
2.45 
2.35 
2.54 


3.50 
2.59 
2.41 
2.44 
2.41 
2.44 


3.14 
2.45 
2.42 
2.39 
2.39 
2.44 


2.79 
2.38 
2.41 
2.44 
2.49 
2.49 


6 Chapter Quiz 


Take this quiz as you would take a quiz in class. After you are done, check your 
work against the answers given in the back of the book. 


Be 1. The winning times (in hours) for a sample of 30 randomly selected 
Boston Marathon Women’s Open Division champions are shown in the 
table at the left. (Source: Boston Athletic Association) 


(a) Find the point estimate of the population mean. 
(b) Find the margin of error for a 95% confidence level. 


(c) Construct a 95% confidence interval for the population mean. 
Interpret the results. 


(d) Does it seem possible that the population mean could be greater 
than 2.75 hours? Explain. 


2. You wish to estimate the mean winning time for Boston Marathon Women’s 
Open Division champions. The estimate must be within 0.13 hour of the 
population mean. Determine the minimum sample size required to construct 
a 99% confidence interval for the population mean. Use the population 
standard deviation from Exercise 1. 


3. The data set represents the amounts of time (in minutes) spent checking email 
for a random sample of employees at a company. 


75 2.0 121 88 94 73 1.9 28 7.0 7.3 
(a) Find the sample mean and the sample standard deviation. 


(b) Construct a 90% confidence interval for the population mean. Interpret 
the results. Assume the times are normally distributed. 


(c) Repeat part (b), assuming 0 = 3.5 minutes. Compare the results. 


4. In a random sample of 12 senior-level chemical engineers, the mean annual 
earnings was $133,326 and the standard deviation was $36,729. Assume the 
annual earnings are normally distributed and construct a 95% confidence 
interval for the population mean annual earnings for senior-level chemical 
engineers. Interpret the results. (Adapted from Salary.com) 


5. You research the salaries of senior-level chemical engineers and find that 
the population mean is $131,935. In Exercise 4, does the t-value fall between 
—fo95 and t9,95? 


6. Ina survey of 1018 U.S. adults, 753 say that the energy situation in the United 
States is very or fairly serious. (Adapted from Gallup) 


(a) Find the point estimate for the population proportion. 


(b) Construct a 90% confidence interval for the population proportion. 
Interpret the results. 


(c) Does it seem possible that the population proportion could be between 
90% and 95% of the point estimate? Explain. 


(d) Find the minimum sample size needed to estimate the population 
proportion at the 99% confidence level in order to ensure that the 
estimate is accurate within 4% of the population proportion. 


7. Refer to the data set in Exercise 3. Assume the population of times spent 
checking email is normally distributed. Construct a 95% confidence interval 
for (a) the population variance and (b) the population standard deviation. 
Interpret the results. 


Chapter Test 363 


6 Chapter Test 


Take this test as you would take a test in class. 


1. Ina survey of 2096 U.S. adults, 1740 think football teams of all levels should 
require players who suffer a head injury to take a set amount of time off from 
playing to recover. (Adapted from The Harris Poll) 

(a) Find the point estimate for the population proportion. 

(b) Construct a 95% confidence interval for the population proportion. 
Interpret the results. 

(c) Does it seem possible that the population proportion could be within 99% 
of the point estimate? Explain. 


(d) Find the minimum sample size needed to estimate the population 
proportion at the 99% confidence level in order to ensure that the 
estimate is accurate within 3% of the population proportion. 


2. The data set represents the weights (in pounds) of 10 randomly selected 
black bears from northeast Pennsylvania. Assume the weights are normally 
distributed. (Source: Pennsylvania Game Commission) 


170 225 183 137 287 191 268 185 211 284 
(a) Find the sample mean and the sample standard deviation. 


(b) Construct a 95% confidence interval for the population mean. Interpret 
the results. 


(c) Construct a 99% confidence interval for the population standard 
deviation. Interpret the results. 


3. The data set represents the scores of 12 randomly selected students on the 
SAT Physics Subject Test. Assume the population test scores are normally 
distributed and the population standard deviation is 104. (Adapted from 
The College Board) 


590 450 490 680 380 500 570 620 640 530 780 720 


(a) Find the point estimate of the population mean. 
(b 


wm 


Construct a 90% confidence interval for the population mean. Interpret 
the results. 


(c) Does it seem possible that the population mean could equal 667? Explain. 


(d) Determine the minimum sample size required to be 95% confident that 
the sample mean test score is within 10 points of the population mean test 
score. 


4. Use the standard normal distribution or the ¢-distribution to construct the 
indicated confidence interval for the population mean of each data set. Justify 
your decision. If neither distribution can be used, explain why. Interpret the 
results. 


(a) In arandom sample of 40 patients, the mean waiting time at a dentist’s 
office was 20 minutes and the standard deviation was 7.5 minutes. 
Construct a 95% confidence interval for the population mean. 


(b) Inarandom sample of 15 cereal boxes, the mean weight was 11.89 ounces. 
Assume the weights of the cereal boxes are normally distributed and the 
population standard deviation is 0.05 ounce. Construct a 90% confidence 
interval for the population mean. 


Putting it all together 


REAL DECISIONS 


The Safe Drinking Water Act, which was passed in 1974, allows the yTED S74, 

: é » & 
Environmental Protection Agency (EPA) to regulate the levels of Peg . 
contaminants in drinking water. The EPA requires that water utilities = 
give their customers water quality reports annually. These reports 3 SZ g 
include the results of daily water quality monitoring, which is performed % ¢ 
to determine whether drinking water is safe for consumption. Veg, prove 


A water department tests for contaminants at water treatment plants 
and at customers’ taps. These contaminants include microorganisms, 
organic chemicals, and inorganic chemicals, such as cyanide. Cyanide’s Cyanide 
presence in drinking water is the result of discharges from steel, plastics, 


and fertilizer factories. For drinking water, the maximum contaminant BS oon ———— - 
level of cyanide is 0.2 part per million. = aie 
As part of your job for your city’s water department, you are 2 008 
preparing a report that includes an analysis of the results shown in z o ae t 
the figure at the right. The figure shows the point estimates for the g£ 005 ti 
population mean concentration and the 95% confidence intervals for . a ae i = 
pu for cyanide over a three-year period. The data are based on random E = 0.02 ¢ 
water samples taken by the city’s three water treatment plants. ae Sc __s 


Year 1 Year 2 Year 3 


EXERCISES — 


1. Interpreting the Results 


Use the figure to determine whether there has been a change in the 
mean concentration level of cyanide for each time period. Explain 
your reasoning. 


(a) From Year 1 to Year 2 (b) From Year 2 to Year 3 
(c) From Year 1 to Year 3 


2. What Can You Conclude? 
Using the results of Exercise 1, what can you conclude about the 
concentrations of cyanide in the drinking water? 

3. What Do You Think? 
The confidence interval for Year 2 is much larger than the other years. 
What do you think may have caused this larger confidence level? 

4. How Can You Improve the Report? 


What can the water department do to decrease the size of the 
confidence intervals, regardless of the amount of variance in cyanide 
levels? 


5. How Do You Think They Did It? 


How do you think the water department constructed the 95% 
confidence intervals for the population mean concentration of 
cyanide in the water? Include answers to the questions below in your 
explanation. 


(a) What sampling distribution do you think they used? Why? 


(b) Do you think they used the population standard deviation in 
calculating the margin of error? Why or why not? If not, what 
could they have used? 


364 CHAPTER 6 Confidence Intervals 


TECHNOLOGY 


United States Foreign Policy Polls 
THE GALLUP ORGANIZATION 


www.gallup.com 


Since 1935, the Gallup Organization has conducted public opinion polls in the United States and 
around the world. The table shows the results of four polls of randomly selected U.S. adults from 2015 
through 2017. The remaining percentages not shown in the results are adults who were not sure. 


Question Results | Number Polled | 
Do you think the U.S. made a mistake sending Yes: 51% 1527 
troops to Iraq? No: 46% 
In the Middle East situation, are your sympathies | Israelis: 62% 1035 
more with the Israelis or the Palestinians? Palestinians: 19% 
Do you have a favorable or unfavorable opinion — Favorable: 22% 1035 
of Russian president Vladimir Putin? _ Unfavorable: 72% 
Should the NATO alliance be maintained or is it. Should be maintained: 80% 485 
not necessary anymore? Not necessary: 16% 
1. Use technology to find a 95% confidence interval for 3. Use technology to simulate a poll. Assume that the 
the population proportion of adults who actual population proportion of adults who think the 


U.S. made a mistake sending troops to Iraq is 54%. 


(a) think sending troops to Iraq was a mistake. . : ; ; 
Run the simulation several times using n = 1527. 


(b) sympathize more with the Israelis than the 


Palestinians (a) What was the least value you obtained for jp? 


(c) have a favorable opinion of Vladimir Putin. (b) What was the greatest value you obtained for 5? 


(d) think the NATO alliance is not necessary. 


(e) do not sympathize with either the Israelis or the 
Palestinians more than the other. 


Random Number Generation 


2. Find the minimum sample size needed to estimate, eas: a 
with 95% confidence, the population proportion of Number of Random Numbers: | 200 Wieancely} 
adults who have a favorable opinion of Vladimir Putin. Distribution: | Binomial { Help | 
Your estimate must be accurate within 2% of the Parameters 


population proportion. Pivalee= 0.54 


Number of trials = 1537 


4. Is it probable that the population proportion of adults 
who think the U.S. made a mistake sending troops to 
Iraq is 54%? Explain your reasoning. 


Extended solutions are given in the technology manuals that accompany this text. 
Technical instruction is provided for Minitab, Excel, and the TI-84 Plus. 


Technology 365 


6 Using Technology to Construct Confidence Intervals 


Here are some Minitab and TI-84 Plus printouts for some examples in this chapter. 
Answers may be slightly different because of rounding. 


Display Descriptive Statistics... 
Store Descriptive Statistics... 


Graphical Summary... 


1-Sample Z... 


41-Sample t... 
2-Sample t... 
Paired t... 


1 Proportion... 
2 Proportions... 


Display Descriptive Statistics... 
Store Descriptive Statistics... 


Graphical Summary... 


1-Sample Z... 


41-Sample t... 


2-Sample t... 
Paired t... 


1 Proportion... 
2 Proportions... 


Display Descriptive Statistics... 
Store Descriptive Statistics... 


Graphical Summary... 


1-Sample Z... 
41-Sample t... 
2-Sample t... 
Paired t... 


2 Proportions... 


See Example 3, page 323. 


19 25 15 21 22 20 20 22 22 21 
21 23 22 16 21 18 25 23 23 21 
22 24 18 19 23 20 19 19 24 25 
17 21 21 25 23 18 22 20 21 21 


One-Sample Z: Hours 
The assumed standard deviation = 2.3 


Variable N Mean StDev SE Mean 95% Cl 
Hours 40 21.050 2.438 C.ee4 (20.887, 21.758) 


See Example 2, page 334. 


One-Sample T 
N Mean StDev SE Mean 95% Cl 
16 162.00 10.00 2150) (s8se7, WS87/.cts) 


See Example 2, page 344. 


Test and Cl for One Proportion 


Sample x N Sample p 95% Cl 
1 1054 1550 0.680000 (0.656130, 0.703186) 


366 CHAPTER 6 Confidence Intervals 


See Example 5, page 325. 


TI-84 PLUS 


EDIT CALC MStSaks) 


Z—Test... 
T—Test... 


oo ols @ hy = 


Zinterval... 


TI-84 PLUS 


ZInterval 


Inpt:Data 
Geile) 

x:22.9 

n:20 

C—Level:.9 
Calculate 


TI-84 PLUS 


ZInterval 


(22.348, 23.452) 
x=22.9 
n=20 


2—SampZTest... 
2—SampT Test... 
1—PropZTest... 
2—PropZTest... 


See Example 3, page 335. 


TI-84 PLUS 


EDIT CALC MiStSiks) 


al T—Test.. 


i 


Zinterval.. 
eR TInterval.. 


TI-84 PLUS 


T Interval 


Inpt:Data 
x:9°75 

Sx:2.39 

n:36 
C—Level:.99 
Calculate 


TI-84 PLUS 


T Interval 


(8.665, 10.835) 
5EE).7/5) 
Sx=2.39 

n=Is) 


BL lays as, ie 
2—SamptT Test... 
1—PropZTest... 
2—PropZTest... 


See Example 2, page 344. 


TI-84 PLUS 


EDIT CALC Mistsiks) 


5S 1—PropZiest... 
6: 2—PropZTest... 
Zinterval... 
Tinterval... 
2—Samp2ZInt... 
2—SamptTint... 
7 1—PropZint... 


2 rr 


TI-84 PLUS 


1-PropZint 


eall@) S52) 
n:1550 
C—Level:.95 
Calculate 


u 


TI-84 PLUS 


1-PropZ Int) 


(.65678, .70322) 
6=.68 
n=1550 


Using Technology to Construct Confidence Intervals 


367 


CHAPTER 


hypothesis testing 
with Une Sample 


The Entertainment Software Rating Board (ESRB) assigns ratings to video games to 
indicate the appropriate ages for players. These ratings include EC (early childhood), 
E (everyone), E10+ (everyone 10+),T (teen), M (mature), and AO (adults only). 


368 


Al 


Introduction to Hypothesis 
Testing 


72 


Hypothesis Testing for the 
Mean (o- Known) 


iE 


Hypothesis Testing for the 
Mean (o- Unknown) 


Activity 
Case Study 


TA 


Hypothesis Testing for 
Proportions 


Activity 


15 


Hypothesis Testing for 
Variance and Standard 
Deviation 


Uses and Abuses 
Real Statistics—Real Decisions 
Technology 


J Where You ve Been 


In Chapter 6, you began your study of inferential statistics. 
There, you learned how to form a confidence interval to 
estimate a population parameter, such as the proportion 
of people in the United States who agree with a certain 
statement. For instance, in a nationwide poll conducted by 
Pew Research Center, 2001 U.S. adults were asked whether 
they agreed or disagreed with the statement, “People who 
play violent video games are more likely to be violent 
themselves.” Out of those surveyed, 800 adults agreed with 
the statement. 


ux Where You re Going 


You have learned how to use these results to state with 95% 
confidence that the population proportion of U.S. adults 
who agree that people who play violent video games are 
more likely to be violent themselves is between 37.9% 
and 42.1%. 


In this chapter, you will continue your study of inferential 
statistics. But now, instead of making an estimate about 
a population parameter, you will learn how to test a claim 
about a parameter. 


For instance, suppose that you work for Pew Research 
Center and are asked to test a claim that the proportion of 
U.S. adults who agree that people who play violent video 
games are more likely to be violent themselves is p = 0.35. 
To test the claim, you take a random sample of n = 2001 
U.S. adults and find that 800 of them think that people 
who play violent video games are more likely to be violent 
themselves. Your sample statistic is 6 ~ 0.400. 


Claim 
p=035) 


Is your sample statistic different enough from the claim 
(p = 0.35) to decide that the claim is false? The answer lies 
in the sampling distribution of sample proportions taken 
from a population in which p = 0.35. The figure below 
shows that your sample statistic is more than 4 standard 
errors from the claimed value. If the claim is true, then the 
probability of the sample statistic being 4 standard errors or 
more from the claimed value is extremely small. Something 
is wrong! If your sample was truly random, then you can 
conclude that the actual proportion of the adult population 
is not 0.35. In other words, you tested the original claim 
(hypothesis), and you decided to reject it. 


Sample statistic 


f | t 
0.29 0.3 031 0.32 0.33 


T 
0.34 0.35 


T 
0.36 0.37 0.38 039 04 O41 


=<x—_ t t t t t t t t [te }—> Z 
6 -5 -4 3 2 = 1 2 3 4 / 5 6 
z= 4.69 


Sampling Distribution 


369 


370 CHAPTER 7 Hypothesis Testing with One Sample 


TA 


What You Should Learn 


» A practical introduction to 
hypothesis tests 


~ How to state a null hypothesis 
and an alternative hypothesis 


» How to identify type | and type 
ll errors and interpret the level 
of significance 

~ How to know whether to use 
a one-tailed or two-tailed 
statistical test and find a P-value 

~ How to make and interpret a 
decision based on the results of 
a statistical test 

~ How to write a claim for a 
hypothesis test 


* Study Tip 
As you study this chapter, 
do not get confused 
regarding concepts of 
certainty and importance. 
For instance, even if you 
were very certain that the 
mean gas mileage of a type of hybrid 
vehicle is not 50 miles per gallon, the 
actual mean mileage might be very 
close to this value and the difference 
might not be important. 


Introduction to Hypothesis Testing 


Hypothesis Tests m Stating a Hypothesis m Types of Errors and Level of 
Significance m Statistical Tests and P-Values m Making a Decision and 
Interpreting the Decision m= Strategies for Hypothesis Testing 


Hypothesis Tests 


Throughout the remainder of this text, you will study an important technique in 
inferential statistics called hypothesis testing. A hypothesis test is a process that 
uses sample statistics to test a claim about the value of a population parameter. 
Researchers in fields such as medicine, psychology, and business rely on 
hypothesis testing to make informed decisions about new medicines, treatments, 
and marketing strategies. 

For instance, consider a manufacturer that advertises its new hybrid car has 
a mean gas mileage of 50 miles per gallon. If you suspect that the mean mileage 
is not 50 miles per gallon, how could you show that the advertisement is false? 

Obviously, you cannot test all the vehicles, but you can still make a 
reasonable decision about the mean gas mileage by taking a random sample 
from the population of vehicles and measuring the mileage of each. If the sample 
mean differs enough from the advertisement’s mean, you can decide that the 
advertisement is wrong. 

For instance, to test that the mean gas mileage of all hybrid vehicles of this 
type is w = 50 miles per gallon, you take a random sample of m = 30 vehicles 
and measure the mileage of each. You obtain a sample mean of x = 47 miles per 
gallon with a sample standard deviation of s = 5.5 miles per gallon. Does this 
indicate that the manufacturer’s advertisement is false? 

To decide, you do something unusual—you assume the advertisement is 
correct! That is, you assume that ~ = 50. Then, you examine the sampling 
distribution of sample means (with n = 30) taken from a population in which 
bw = 50 and o = 5.5. From the Central Limit Theorem, you know this sampling 
distribution is normal with a mean of 50 and standard error of 


5.5 


In the figure below, notice that the sample mean of x = 47 miles per gallon 
is highly unlikely—it is about 3 standard errors (z ~ —2.99) from the claimed 
mean! Using the techniques you studied in Chapter 5, you can determine that 
if the advertisement is true, then the probability of obtaining a sample mean 
of 47 or less is about 0.001. This is an unusual event! Your assumption that the 
company’s advertisement is correct has led you to an improbable result. So, 
either you had a very unusual sample, or the advertisement is probably false. The 
logical conclusion is that the advertisement is probably false. 


Sampling Distribution of x 


Hypothesized mean 
Sample mean Le =50 


x =47 
\ 


t t t t t 
see 48 49 50 51 52 53 54 


Zz 
-4 -3 -2 -1 0 1 2 3 4 


Study Tip 


The term null hypothesis 
was introduced by Ronald 
Fisher (see page 57). If 
the statement in the null 
hypothesis is not true, 
then the alternative 
hypothesis must be true. 


LSD 
Coy. 
Meee) 


eee) Pieturing 
the World 


A study was done on the effect of a 
wearable fitness device combined 
with a low-calorie diet on weight 
loss. The study used a random 
sample of 237 adults. At the end of 
the study, the adults had a mean 
weight loss of 3.5 kilograms. So, 

it is claimed that the mean weight 
loss is 3.5 kilograms for all adults 
who use a wearable fitness device 
combined with a low-calorie diet. 
(Adapted from The Journal of the 
American Medical Association) 


Determine a null hypothesis and 
alternative hypothesis for this 
claim. 


SECTION 7.1 Introduction to Hypothesis Testing 371 


Stating a Hypothesis 


A statement about a population parameter is called a statistical hypothesis. To 
test a population parameter, you should carefully state a pair of hypotheses— 
one that represents the claim and the other, its complement. When one of 
these hypotheses is false, the other must be true. Either hypothesis—the null 
hypothesis or the alternative hypothesis—may represent the original claim. 


DEFINITION 


1. A null hypothesis Hp is a statistical hypothesis that contains a statement of 
equality, such as =, =, or =. 

2. The alternative hypothesis H, is the complement of the null hypothesis. 
It is a statement that must be true if Hp is false and it contains a statement 
of strict inequality, such as >, #, or <. 


The symbol Ho is read as “H sub-zero” or “H naught” and H, is read as 
“H sub-a.” 


To write the null and alternative hypotheses, translate the claim made 
about the population parameter from a verbal statement to a mathematical 
statement. Then, write its complement. For instance, if the claim value is k and 
the population parameter is 4, then some possible pairs of null and alternative 
hypotheses are 


ane eee 
Hy, pw>k Hyp <k 


a 


Ao: he k 
and i at. 


Regardless of which of the three pairs of hypotheses you use, you always 
assume yo = k and examine the sampling distribution on the basis of this 
assumption. Within this sampling distribution, you will determine whether or not 
a sample statistic is unusual. 

The table shows the relationship between possible verbal statements 
about the parameter w and the corresponding null and alternative hypotheses. 
Similar statements can be made to test other population parameters, such as p, 


go, or 0. 


Verbal Verbal 
Statement Hp Mathematical Statement H, 
The mean is... Statements The mean is... 
. . greater than or equal to k. .. less than k. 
. at least k. Ho: w = k .. below k. 
. not less than k. i: . . fewer than k. 
niah<k 
. not shorter than k. . shorter than k. 
. . less than or equal to k. . greater than k. 
. at most k. How <k . above k. 
. not more than k. Hai ek . more than k. 
. not longer than k. ae . longer than k. 
.. equal to k. .. not equal to k. 
.k. . different from k. 
. exactly k. i B= .. not k. 
. the same as k. AywAk .. different from k. 
. not changed from k. . changed from k. 


372 CHAPTER 7 Hypothesis Testing with One Sample 


Stating the Null and Alternative Hypotheses 


Write each claim as a mathematical statement. State the null and alternative 
hypotheses, and identify which represents the claim. 


1. A school publicizes that the proportion of its students who are involved in 
at least one extracurricular activity is 61%. 


2. A car dealership announces that the mean time for an oil change is less than 
15 minutes. 


3. A company advertises that the mean life of its furnaces is more than 18 years. 


SOLUTION 
i Hy e, 1. The claim “the proportion . . . is 61%” can be written as p = 0.61. Its 
: | , complement is p # 0.61, as shown in the figure at the left. Because 
p g 
<P 3 ee a ‘ 
057 059 061 063 0.65 p = 0.61 contains the statement of equality, it becomes the null hypothesis. 


In this case, the null hypothesis represents the claim. You can write the null 
and alternative hypotheses as shown. 


A: p = 0.61 (Claim) 


H,: p ¥ 0.61 
H. Hy 2. The claim “the mean .. . is less than 15 minutes” can be written as w < 15. 
. i“ ae Its complement is 4 = 15, as shown in the figure at the left. Because w = 15 
2 1B 14 15 16 17 18 19 contains the statement of equality, it becomes the null hypothesis. In this 
case, the alternative hypothesis represents the claim. You can write the null 
and alternative hypotheses as shown. 
Ay: w = 15 minutes 
A, w < 15 minutes (Claim) 
Ay Ht, 3. The claim “the mean... is more than 18 years” can be written as uw > 18. 
: Rd om Its complement is 1 = 18, as shown in the figure at the left. Because w = 18 
415 6 UW IW 19 20 2 2 contains the statement of equality, it becomes the null hypothesis. In this 


case, the alternative hypothesis represents the claim. You can write the null 
and alternative hypotheses as shown. 


A: w = 18 years 
A, w > 18 years (Claim) 


In the three figures at the left, notice that each point on the number line is in 
either Hy or H,, but no point is in both. 


TRY IT YOURSELF 1 


Write each claim as a mathematical statement. State the null and alternative 
hypotheses, and identify which represents the claim. 


1. Aconsumer analyst reports that the mean life of a certain type of automobile 
battery is not 74 months. 
2. An electronics manufacturer publishes that the variance of the life of its 
home theater systems is less than or equal to 2.7. 
3. A realtor publicizes that the proportion of homeowners who feel their 
house is too small for their family is more than 24%. 
Answer: Page A36 


In Example 1, notice that the claim is represented by either the null 
hypothesis or the alternative hypothesis. 


SECTION 7.1 Introduction to Hypothesis Testing 373 


Types of Errors and Level of Significance 


No matter which hypothesis represents the claim, you always begin a hypothesis 
test by assuming that the equality condition in the null hypothesis is true. So, 
when you perform a hypothesis test, you make one of two decisions: 


1. reject the null hypothesis 
or 


2. fail to reject the null hypothesis. 


Because your decision is based on a sample rather than the entire population, 
there is always the possibility you will make the wrong decision. 

For instance, you claim that a coin is not fair. To test your claim, you toss the 
coin 100 times and get 49 heads and 51 tails. You would probably agree that you 
do not have enough evidence to support your claim. Even so, it is possible that 
the coin is actually not fair and you had an unusual sample. 

But then you toss the coin 100 times and get 21 heads and 79 tails. It would 
be a rare occurrence to get only 21 heads out of 100 tosses with a fair coin. So, 
you probably have enough evidence to support your claim that the coin is not 
fair. However, you cannot be 100% sure. It is possible that the coin is fair and 
you had an unusual sample. 

Letting p represent the proportion of heads, the claim that “the coin is not 
fair” can be written as the mathematical statement p # 0.5. Its complement, 
“the coin is fair,” is written as p = 0.5, as shown in the figure. 


So, the null hypothesis is 
Hy: p = 0.5 

and the alternative hypothesis is 
H,: p # 0.5. (Claim) 


Remember, the only way to be absolutely certain of whether Hp is true or 
false is to test the entire population. Because your decision—to reject Hp or 
to fail to reject Hy—is based on a sample, you must accept the fact that your 
decision might be incorrect. You might reject a null hypothesis when it is actually 
true. Or, you might fail to reject a null hypothesis when it is actually false. These 
types of errors are summarized in the next definition. 


DEFINITION 


A type I error occurs if the null hypothesis is rejected when it is true. 


A type II error occurs if the null hypothesis is not rejected when it is false. 


The table shows the four possible outcomes of a hypothesis test. 


Truth of Hy 
Decision Ay is true. Hy is false. 
Do not reject Hj. | Correct decision | Type II error 


Reject Hp. Type I error Correct decision 


374 CHAPTER 7 Hypothesis Testing with One Sample 


Verdict 
Not guilty 


Guilty 


Truth about 
defendant 


Innocent 
Justice 


Type I error 


Guilty 


Type II 
error 


Justice 


Hypothesis testing is sometimes compared to the legal system used in the 
United States. Under this system, these steps are used. 


1. A carefully worded accusation is written. 


2. The defendant is assumed innocent (Ho) until proven guilty. The burden of 
proof lies with the prosecution. If the evidence is not strong enough, then 
there is no conviction. A “not guilty” verdict does not prove that a defendant 
is innocent. 


3. The evidence needs to be conclusive beyond a reasonable doubt. The system 
assumes that more harm is done by convicting the innocent (type I error) than 
by not convicting the guilty (type IJ error). 


The table at the left shows the four possible outcomes. 


Identifying Type | and Type II Errors 

The USDA limit for salmonella contamination for ground beef is 7.5%. A 
meat inspector reports that the ground beef produced by a company exceeds 
the USDA limit. You perform a hypothesis test to determine whether the meat 
inspector’s claim is true. When will a type I or type II error occur? Which error 
is more serious? (Source: U.S. Department of Agriculture) 


SOLUTION 

Let p represent the proportion of the ground beef that is contaminated. The 
meat inspector’s claim is “more than 7.5% is contaminated.” You can write the 
null hypothesis as 


Hy: p = 0.075 The proportion is less than or equal to 0.075. 
and the alternative hypothesis is 
H,: p > 0.075. (Claim) The proportion is greater than 0.075. 
You can visualize the null and alternative hypotheses using a number line, as 


shown below. 


Ground beef meets Ground beef exceeds 
USDA limits USDA limits 

H):p $0.075 H,:p > 0.075 
c sto ~~ ~ ~ 


<—+—_—-—_-—_+—_ >_> P 
0.055 0.065 0.075 0.085 0.095 


A type I error will occur when the actual proportion of contaminated ground 
beef is less than or equal to 0.075, but you reject Hp. A type II error will occur 
when the actual proportion of contaminated ground beef is greater than 0.075, 
but you do not reject Hp. With a type I error, you might create a health scare 
and hurt the sales of ground beef producers who were actually meeting the 
USDA limits. With a type I error, you could be allowing ground beef that 
exceeded the USDA contamination limit to be sold to consumers. A type II 
error is more serious because it could result in sickness or even death. 


TRY IT YOURSELF 2 


A company specializing in parachute assembly states that its main parachute 
failure rate is not more than 1%. You perform a hypothesis test to determine 
whether the company’s claim is false. When will a type I or type IJ error occur? 
Which error is more serious? 

Answer: Page A36 


Study Tip 


When you decrease a 
(the maximum allowable 
probability of making a 
type | error), you are likely 
to be increasing B. The 
& @® value1 — B is called the 
power of the test. It represents 
the probability of rejecting the null 
hypothesis when it is false. The 
value of the power is difficult (and 
sometimes impossible) to find in 
most cases. 


SECTION 7.1 Introduction to Hypothesis Testing 375 


You will reject the null hypothesis when the sample statistic from the 
sampling distribution is unusual. You have already identified unusual events 
to be those that occur with a probability of 0.05 or less. When statistical tests 
are used, an unusual event is sometimes required to have a probability of 0.10 
or less, 0.05 or less, or 0.01 or less. Because there is variation from sample to 
sample, there is always a possibility that you will reject a null hypothesis when 
it is actually true. In other words, although the null hypothesis is true, your 
sample statistic is determined to be an unusual event in the sampling distribution. 
You can decrease the probability of this happening by lowering the level of 
significance. 


DEFINITION 


In a hypothesis test, the level of significance is your maximum allowable 
probability of making a type I error. It is denoted by a, the lowercase Greek 


letter alpha. 


The probability of a type II error is denoted by B, the lowercase Greek 
letter beta. 


By setting the level of significance at a small value, you are saying that 
you want the probability of rejecting a true null hypothesis to be small. Three 
commonly used levels of significance are 


a=010, a@=005, and a=0.01. 


Statistical Tests and P-Values 


After stating the null and alternative hypotheses and specifying the level of 
significance, the next step in a hypothesis test is to obtain a random sample 
from the population and calculate the sample statistic (such as x, p, or s”) 
corresponding to the parameter in the null hypothesis (such as mu, p, or a7). 
This sample statistic is called the test statistic. With the assumption that the 
null hypothesis is true, the test statistic is then converted to a standardized test 
statistic, such as z, ft, or vy”. The standardized test statistic is used in making the 
decision about the null hypothesis. 

In this chapter, you will learn about several one-sample statistical tests. 
The table shows the relationships between population parameters and their 
corresponding test statistics and standardized test statistics. 


Population Test Standardized 
parameter __ statistic test statistic 
bh x z (Section 7.2, 0 known), 
t (Section 7.3, 0 unknown) 
Pp p z (Section 7.4) 
o se x’ (Section 7.5) 


One way to decide whether to reject the null hypothesis is to determine 
whether the probability of obtaining the standardized test statistic (or one that is 
more extreme) is less than the level of significance. 


DEFINITION 


If the null hypothesis is true, then a P-value (or probability value) of a 


hypothesis test is the probability of obtaining a sample statistic with a value 
as extreme or more extreme than the one determined from the sample data. 


376 CHAPTER 7 Hypothesis Testing with One Sample 


The P-value of a hypothesis test depends on the nature of the test. There 
are three types of hypothesis tests—left-tailed, right-tailed, and two-tailed. 
The type of test depends on the location of the region of the sampling 
distribution that favors a rejection of Ho. This region is indicated by the 
alternative hypothesis. 


DEFINITION 


1. If the alternative hypothesis H, contains the less-than inequality symbol 
(<), then the hypothesis test is a left-tailed test. 


P is the area to 
the left of the 
standardized 
test statistic. 


Standardized test statistic 


Left-Tailed Test 


2. If the alternative hypothesis H, contains the greater-than inequality 
symbol (>), then the hypothesis test is a right-tailed test. 


P is the area to 
Hy Usk the right of the 
Hy; U>k standardized 

test statistic. 


Standardized test statistic 


Right-Tailed Test 


3. If the alternative hypothesis H, contains the not-equal-to symbol (#), 
then the hypothesis test is a two-tailed test. In a two-tailed test, each tail 
has an area of 5P. 


Study Tip 
The third type of test is 
called a two-tailed test 
because evidence that 
would support the 
alternative hypothesis 
could lie in either tail of 
the sampling distribution. 


Ho w=k atone hei The area to the right 


of the negative of the positive 
standardized : standardized 
test statistic is af test statistic is 5P. 


A, wétk 


Standardized test statistic Standardized test statistic 


Two-Tailed Test 


The smaller the P-value of the test, the more evidence there is to reject the 
null hypothesis. A very small P-value indicates an unusual event. Remember, 
however, that even a very low P-value does not constitute proof that the null 
hypothesis is false, only that it is probably false. 


Standardized test statistic 


P-value 
area 


+ > Z 


0 


Standardized test statistic 


P-value 


Standardized test statistic 


SECTION 7.1 Introduction to Hypothesis Testing 377 


Identifying the Nature of a Hypothesis Test 


For each claim, state Hy and H, in words and in symbols. Then determine 

whether the hypothesis test is a left-tailed test, right-tailed test, or two-tailed 

test. Sketch a normal sampling distribution and shade the area for the P-value. 

1. A school publicizes that the proportion of its students who are involved in 
at least one extracurricular activity is 61%. 

2. A car dealership announces that the mean time for an oil change is less than 
15 minutes. 


3. A company advertises that the mean life of its furnaces is more than 18 years. 


SOLUTION 
In Symbols In Words 
1. Ho: p = 0.61 The proportion of students who are involved in at least 
one extracurricular activity is 61%. 
H,: p # 0.61 The proportion of students who are involved in at least 


one extracurricular activity is not 61%. 


Because H, contains the # symbol, the test is a two-tailed hypothesis test. 
The figure at the left shows the normal sampling distribution with a shaded 
area for the P-value. 


In Symbols In Words 
2. Ho: w = 15 min The mean time for an oil change is greater than or 
equal to 15 minutes. 


Ay: aw < 15 min The mean time for an oil change is less than 15 minutes. 


Because H, contains the < symbol, the test is a left-tailed hypothesis test. 
The figure at the left shows the normal sampling distribution with a shaded 
area for the P-value. 


In Symbols In Words 
3. Ho: w = 18 yr The mean life of the furnaces is less than or equal to 
18 years. 
A: w > 18 yr The mean life of the furnaces is more than 18 years. 


Because H, contains the > symbol, the test is a right-tailed hypothesis test. 
The figure at the left shows the normal sampling distribution with a shaded 
area for the P-value. 


TRY IT YOURSELF 3 


For each claim, state Hy and H, in words and in symbols. Then determine 
whether the hypothesis test is a left-tailed test, right-tailed test, or two-tailed 
test. Sketch a normal sampling distribution and shade the area for the P-value. 


1. A consumer analyst reports that the mean life of a certain type of automobile 
battery is not 74 months. 
2. An electronics manufacturer publishes that the variance of the life of its 
home theater systems is less than or equal to 2.7. 
3. A realtor publicizes that the proportion of homeowners who feel their 
house is too small for their family is more than 24%. 
Answer: Page A36 


378 


Study Tip 


In this chapter, you will learn 
that there are two types of 
decision rules for deciding 
whether to reject Hp or fail 
to reject Ho. The decision 
rule described on this page 
is based on P-values. The second type 
of decision rule is based on rejection 
regions. When the standardized test 
statistic falls in the rejection region, 
the observed probability (P-value) of 
a type | error is less than a. You will 
learn more about rejection regions in 
the next section. 


CHAPTER 7 Hypothesis Testing with One Sample 


Making a Decision and Interpreting the Decision 


To conclude a hypothesis test, you make a decision and interpret that decision. 
For any hypothesis test, there are two possible outcomes: (1) reject the null 
hypothesis or (2) fail to reject the null hypothesis. To decide to reject Hp or fail 
to reject Hp, you can use the following decision rule. 


Decision Rule Based on P-Value 


To use a P-value to make a decision in a hypothesis test, compare the P-value 
with a. 


1. If P = a, then reject Hp. 
2. If P > a, then fail to reject Hp. 


Failing to reject the null hypothesis does not mean that you have accepted 
the null hypothesis as true. It simply means that there is not enough evidence 
to reject the null hypothesis. To support a claim, state it so that it becomes the 
alternative hypothesis. To reject a claim, state it so that it becomes the null 
hypothesis. The table will help you interpret your decision. 


Claim 
Decision Claim is Hp. Claim is H,. 
Reject Hp. There is enough evidence There is enough evidence to 
to reject the claim. support the claim. 
Fail to reject Hj). | There is not enough There is not enough evidence 


evidence to reject the claim. | to support the claim. 


Interpreting a Decision 


You perform a hypothesis test for each claim. How should you interpret your 
decision if you reject Ho? If you fail to reject Hy? 


1. Hp (Claim): A school publicizes that the proportion of its students who are 
involved in at least one extracurricular activity is 61%. 


2. H, (Claim): A car dealership announces that the mean time for an oil 
change is less than 15 minutes. 


SOLUTION 


1. The claim is represented by Hp. If you reject Hp, then you should conclude 
“there is enough evidence to reject the school’s claim that the proportion of 
students who are involved in at least one extracurricular activity is 61%.” 
If you fail to reject Hp, then you should conclude “there is not enough 
evidence to reject the school’s claim that the proportion of students who are 
involved in at least one extracurricular activity is 61%.” 


2. The claim is represented by H,, so the null hypothesis is “the mean time for 
an oil change is greater than or equal to 15 minutes.” If you reject Ho, then 
you should conclude “there is enough evidence to support the dealership’s 
claim that the mean time for an oil change is less than 15 minutes.” If you 
fail to reject Ho, then you should conclude “there is not enough evidence to 
support the dealership’s claim that the mean time for an oil change is less 
than 15 minutes.” 


SECTION 7.1 Introduction to Hypothesis Testing 379 


TRY IT YOURSELF 4 


You perform a hypothesis test for each claim. How should you interpret your 
decision if you reject Ho? If you fail to reject Hy? 


1. A consumer analyst reports that the mean life of a certain type of automobile 
battery is not 74 months. 


2. H, (Claim): A realtor publicizes that the proportion of homeowners who 
feel their house is too small for their family is more than 24%. 
Answer: Page A36 


The general steps for a hypothesis test using P-values are summarized below. 
Note that when performing a hypothesis test, you should always state the null 
and alternative hypotheses before collecting data. You should not collect the 
data first and then create a hypothesis based on something unusual in the data. 


Steps for Hypothesis Testing 


. State the claim mathematically and verbally. Identify the null and 
alternative hypotheses. 


Ao: A; eB 


. Specify the level of significance. 


& 


. Determine the standardized This sampling distribution 
sampling distribution and sketch is based on the assumption 
its graph. that Ho is true. 


0 


. Calculate the test statistic and its 
corresponding standardized test 
statistic. Add it to your sketch. | 


0 
S 


tandardized test statistic 


- Find the P-value. 


. Use this decision rule. 


Is the P-value less than or 
equal to the level of No Fail to reject Hp. 
significance? 


Reject Hp. 


7. Write a statement to interpret the decision in the context of the original 
claim. 


In Step 4 above, the figure shows a right-tailed test. However, the same basic 
steps also apply to left-tailed and two-tailed tests. 


380 


CHAPTER 7 Hypothesis Testing with One Sample 


Strategies for Hypothesis Testing 


In a courtroom, the strategy used by an attorney depends on whether the attorney 
is representing the defense or the prosecution. In a similar way, the strategy that 
you will use in hypothesis testing should depend on whether you are trying to 
support or reject a claim. Remember that you cannot use a hypothesis test to 
support your claim when your claim is the null hypothesis. So, as a researcher, 
to perform a hypothesis test where the possible outcome will support a claim, 
word the claim so it is the alternative hypothesis. To perform a hypothesis test 
where the possible outcome will reject a claim, word it so the claim is the null 
hypothesis. 


Writing the Hypotheses 


A medical research team is investigating the benefits of a new surgical 
treatment. One of the claims is that the mean recovery time for patients after 
the new treatment is less than 96 hours. 


1. 


How would you write the null and alternative hypotheses when you are on 
the research team and want to support the claim? How should you interpret 
a decision that rejects the null hypothesis? 


. How would you write the null and alternative hypotheses when you are on 


an opposing team and want to reject the claim? How should you interpret a 
decision that rejects the null hypothesis? 


SOLUTION 


1. 


To answer the question, first think about the context of the claim. Because 
you want to support this claim, make the alternative hypothesis state that 
the mean recovery time for patients is less than 96 hours. So, H,: w < 96 
hours. Its complement, Hp: ~ = 96 hours, would be the null hypothesis. If 
you reject Ho, then you will support the claim that the mean recovery time 
is less than 96 hours. 


Ay: w = 96 and A, w < 96 (Claim) 


. First think about the context of the claim. As an opposing researcher, you 


do not want the recovery time to be less than 96 hours. Because you want 
to reject this claim, make it the null hypothesis. So, Hp: w = 96 hours. Its 
complement, H,: w > 96 hours, would be the alternative hypothesis. If you 
reject Hp, then you will reject the claim that the mean recovery time is less 
than or equal to 96 hours. 


Ao: w = 96 (Claim) and A, wp > 96 


TRY IT YOURSELF 5 


1 


You represent a chemical company that is being sued for paint damage to 
automobiles. You want to support the claim that the mean repair cost per 
automobile is less than $650. How would you write the null and alternative 
hypotheses? How should you interpret a decision that rejects the null 
hypothesis? 


. You are on a research team that is investigating the mean temperature of 


adult humans. The commonly accepted claim is that the mean temperature 
is about 98.6°F. You want to show that this claim is false. How would you 
write the null and alternative hypotheses? How should you interpret a 
decision that rejects the null hypothesis? 

Answer: Page A36 


SECTION 7.1 Introduction to Hypothesis Testing 381 


7.1 EXERCISES reread Ss 


Building Basic Skills and Vocabulary 


1. What are the two types of hypotheses used in a hypothesis test? How are 
they related? 


2. Describe the two types of errors possible in a hypothesis test decision. 


3. What are the two decisions that you can make from performing a 
hypothesis test? 


4. Does failing to reject the null hypothesis mean that the null hypothesis 
is true? Explain. 


True or False? Jn Exercises 5-10, determine whether the statement is true or 
false. If it is false, rewrite it as a true statement. 


5. Ina hypothesis test, you assume the alternative hypothesis is true. 
6. A statistical hypothesis is a statement about a sample. 


7. If you decide to reject the null hypothesis, then you can support the 
alternative hypothesis. 


8. The level of significance is the maximum probability you allow for rejecting 
a null hypothesis when it is actually true. 


9. A large P-value in a test will favor rejection of the null hypothesis. 


10. To support a claim, state it so that it becomes the null hypothesis. 


Stating Hypotheses Jn Exercises 11-16, the statement represents a claim. 
Write its complement and state which is Hy and which is Hy. 


11. pw = 645 12. w < 128 
13. 0 45 14. 0° = 12 
15. p < 0.45 16. p = 0.21 


Graphical Analysis Jn Exercises 17-20, match the alternative hypothesis with 
its graph. Then state the null hypothesis and sketch its graph. 


17. Hy: w > 3 (a) <—+-——>—$— 
1 2 3 4 

18. H,: w < 3 (b) <——$ 1 
1 2 3 4 

19. H,: w A 3 (c) —<—<—$ 
1 2 3 4 

20. Hy: w > 2 (d) <--> —_+— 
1 2 3 4 


Identifying a Test Jn Exercises 21-24, determine whether the hypothesis test 
is left-tailed, right-tailed, or two-tailed. 


21. Hp: w = 8.0 22. Hy: o = 5.2 
H,: wp > 8.0 A,.o < 5.2 
23. Hy: 0? = 142 24. Hy: p = 0.25 


H,,0? # 142 H,: p 0.25 


382 


CHAPTER 7 Hypothesis Testing with One Sample 


Using and Interpreting Concepts 


Stating the Null and Alternative Hypotheses In Exercises 25-30, write 
the claim as a mathematical statement. State the null and alternative hypotheses, 
and identify which represents the claim. 


25. 


26. 


27. 


28. 


29. 


30. 


Sale Price of a Bike The standard deviation of the sale price of a bike is no 
more than $225. 


Museum A museum claims that its mean daily attendance is at least 300 
people. 


Microwaves A microwave manufacturer claims that the mean life of the 
thermostat for a certain model of microwave is more than 4 years. 


Delivery Errors As stated by a courier company’s dispatch department, the 
number of delivery errors per million consignments has a standard deviation 
that is less than 2. 


Paying for College According to a recent survey, 73% of college students 
did not use student loans to pay for college. (Source: Sallie Mae) 


Paying for College According to a recent survey, 52% of college students 
used their own income or savings to pay for college. (Source: Sallie Mae) 


Identifying Type | and Type Il Errors § In Exercises 31-36, describe type I 
and type IT errors for a hypothesis test of the indicated claim. 


31. 


32. 


33. 


34. 


35. 


36. 


Repeat Customers An online food delivery website claims that at least 
75% of its new customers will return to place their next order. 


Flow Rate A hose manufacturer claims that the maximum water flow rate 
in a garden is 40 liters per minute. 


Cricket A local cricket club claims that the length of time to bowl an over 
has a standard deviation of more than 4 minutes. 


Surround Sound Speaker A researcher claims that the percentage of adults 
in Egypt who own a surround sound speaker is not 40%. 


Security A data encryption company publicizes that at least 98% of the 
systems using their software are protected. 


Laptop A laptop repair shop advertises that the mean cost of repairing a 
laptop is less than $125. 


Identifying the Nature of a Hypothesis Test Jn Exercises 37-42, state 
H and H, in words and in symbols. Then determine whether the hypothesis test 
is left-tailed, right-tailed, or two-tailed. Explain your reasoning. Sketch a normal 
sampling distribution and shade the area for the P-value. 


37. 


38. 


39. 


Glass A glass manufacturer claims that the mean number of glasses that 
break is no more than 3 glasses per production run. 


Insurance Policies An insurance agent claims that at least 65% of all 
businesses have a fire insurance policy. 


IT College Placement Rate An IT college claims that its mean placement 
rate is 95%. 


. Lung Cancer A report claims that lung cancer accounts for 25% of all 


cancer diagnoses. (Source: American Cancer Society) 


41. 


42. 


SECTION 7.1 Introduction to Hypothesis Testing 383 


Derby A derby analyst claims that the standard deviation of a jockey 
clearing the obstacles is less than 2 obstacles. 


Survey A polling organization reports that the number of samples 
distributed to 5,000 households does not increase the mean sale by 5,000 
units. 


Interpreting a Decision Jn Exercises 43-48, determine whether the claim 
represents the null hypothesis or the alternative hypothesis. If a hypothesis test is 
performed, how should you interpret a decision that 


(a) rejects the null hypothesis? 


(b) fails to reject the null hypothesis? 


43. 


AA, 


45. 


47. 


49. 


50. 


51. 


52. 


Air Conditioner A researcher claims that standard deviation of the life 
span of a brand of air conditioner is at most 4.6 years. 


Affording Basic Necessities A report claims that more than 40% of 
households in a New York county struggle to afford basic necessities. 
(Source: Niagara Frontier Publications) 


Migratory Birds A scientist claims that the mean number of migratory bird 
species is less than 3,800. 


. Gas Mileage An automotive manufacturer claims that the standard 


deviation for the gas mileage of one of the vehicles it manufactures is 
5.6 kilometers per liter. 


Terrorism Convictions A report claims that at least 65% of individuals 
convicted of terrorism or terrorism-related offenses in the United States are 
foreign born. (Source: Hannity.com) 


. Balanced Diet A non-governmental organization claims that none of the 


children in a particular slum area get a balanced diet. 


Writing Hypotheses: Health A fitness research team is investigating the 
mean cost of a 45-day weight loss course. A fitness center thinks that the 
mean cost is less than $75. You want to support this claim. How would you 
write the null and alternative hypotheses? 


Writing Hypotheses: Internet Service Provider An internet service provider 
claims that the mean bandwidth drop time is about 7 minutes. You work for 
one of the ISP’s competitors and want to reject the claim. How would you 
write the null and alternative hypotheses? 


Writing Hypotheses: Carbine Manufacturer A carbine manufacturer 
claims that the mean life span of its competitor’s carbines is less than 46 runs. 
You are asked to perform a hypothesis test to test this claim. How would you 
write the null and alternative hypotheses when 

(a) you represent the manufacturer and want to support the claim? 


(b) you represent the competitor and want to reject the claim? 
Writing Hypotheses: Internet Provider An Internet provider is trying to gain 
advertising deals and claims that the mean time a customer spends online per 


day is greater than 28 minutes. You are asked to test this clam. How would you 
write the null and alternative hypotheses when 


(a) you represent the Internet provider and want to support the claim? 
(b) you represent a competing advertiser and want to reject the claim? 


384 


CHAPTER 7 Hypothesis Testing with One Sample 


Extending Concepts 


53. 


54. 


55. 


56. 


Getting at the Concept Why can decreasing the probability of a type I error 
cause an increase in the probability of a type II error? 


Getting at the Concept Explain why a level of significance of a = 0 is 
not used. 


Writing A null hypothesis is rejected with a level of significance of 0.05. 
Is it also rejected at a level of significance of 0.10? Explain. 


Writing A null hypothesis is rejected with a level of significance of 0.10. 
Is it also rejected at a level of significance of 0.05? Explain. 


Graphical Analysis Jn Exercises 57-60, you are given a null hypothesis and 
three confidence intervals that represent three samplings. Determine whether each 
confidence interval indicates that you should reject Hj. Explain your reasoning. 


57. 


59. 


60. 


Hy: “>70 (a) 67 <u<71 


ae a a a es a el tH tH 
67 68 69 70 71 72 73 67 68 69 70 71 72 «73 


(b) 67<y<69 


(c) 69.5<U< 72.5 
<p} meno 
67 68 69 70 71 72 = 73 


Hy: US 54 (a) 53.5<U<56.5 


(b) 51.5 << 54.5 


Hy: p $0.20 (a) 0.21 <p <0.23 


——$— ft pp ~ | } Omen) 
0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.17 0.18 0.19 0.20 0.21 0.22 0.23 


(b) 0.19 <p <0.23 
« -—++-_++_+—_» > p 
17 0.18 0.19 0.20 0.21 0.22 0.23 


S 


(c) 0.175 <p < 0.205 
<p $+ +—_+ > p 
0.17 0.18 0.19 0.20 0.21 0.22 0.23 


Hy: p 20.73 (a) 0.73 <p <0.75 
<a 2p ng 
0.70 0.71 0.72 0.73 0.74 0.75 0.76 0.70 0.71 0.72 0.73 0.74 0.75 0.76 

(b) 0.715 <p <0.725 
- eo 
0.70 0.71 0.72 0.73 0.74 0.75 0.76 
(c) 0.695 <p < 0.745 


ee 
0.70 0.71 0.72 0.73 0.74 0.75 0.76 


7.2 


What You Should Learn 


» How to find and interpret 
P-values 


Ww 


» How to use P-values for a 
z-test for a mean ww when a is 
known 


oy 


How to find critical values 

and rejection regions in the 
standard normal distribution 

» How to use rejection regions for 
a z-test for a mean uw when a is 
known 


Hypothesis Testing for the Mean (o- Known) 


SECTION 7.2 Hypothesis Testing for the Mean (a Known) 385 


Using P-Valuesto Make Decisions m= Using PValuesforaz-Test m Rejection 
Regions and Critical Values m» Using Rejection Regions for a z-Test 


Using P-Values to Make Decisions 


In Chapter 5, you learned that when the sample size is at least 30, the sampling 
distribution for x (the sample mean) is normal. In Section 7.1, you learned 
that a way to reach a conclusion in a hypothesis test is to use a P-value for the 
sample statistic, such as x. Recall that when you assume the null hypothesis is 
true, a P-value (or probability value) of a hypothesis test is the probability of 
obtaining a sample statistic with a value as extreme or more extreme than the 
one determined from the sample data. The decision rule for a hypothesis test 
based on a P-value is shown below. 


Decision Rule Based on P-Value 


To use a P-value to make a decision in a hypothesis test, compare the P-value 
with a. 


1. If P = a, then reject Hp. 


2. If P > a, then fail to reject Ho. 


Interpreting a P-Value 

The P-value for a hypothesis test is P = 0.0237. What is your decision when 
the level of significance is (1) a = 0.05 and (2) a = 0.01? 

SOLUTION 

1. Because 0.0237 < 0.05, you reject the null hypothesis. 

2. Because 0.0237 > 0.01, you fail to reject the null hypothesis. 


TRY IT YOURSELF 1 


The P-value for a hypothesis test is P = 0.0745. What is your decision when 
the level of significance is (1) a = 0.05 and (2) a = 0.10? = Answer: Page A37 


The lower the P-value, the more evidence there is in favor of rejecting Hp. 
The P-value gives you the lowest level of significance for which the sample 
statistic allows you to reject the null hypothesis. In Example 1, you would reject 
Ho at any level of significance greater than or equal to 0.0237. 


Finding the P-Value for a Hypothesis Test 


After determining the hypothesis test’s standardized test statistic and the 
standardized test statistic’s corresponding area, do one of the following to 
find the P-value. 


a. For a left-tailed test, P = (Area in left tail). 
b. For a right-tailed test, P = (Area in right tail). 
c. For a two-tailed test, P = 2(Area in tail of standardized test statistic). 


386 CHAPTER 7 Hypothesis Testing with One Sample 


The area to the left 


of z=—2.23 is 
P=0.0129. 
f t t t t t F z 
-2 -1 0 1 2 3 
Z=-2.23 


Left-Tailed Test 


The area to the right 
of z = 2.14 is 0.0162, so 
P =2(0.0162) = 0.0324. 


tt Ht HH 
3 2 1 0 4 2\ 3 


z 


Two-Tailed Test 


Finding a P-Value for a Left-Tailed Test 
Find the P-value for a left-tailed hypothesis test with a standardized test 


statistic of z = —2.23. Decide whether to reject Hj) when the level of 
significance is a = 0.01. 
SOLUTION 


The figure at the left shows the standard normal curve with a shaded area to 
the left of z = —2.23. For a left-tailed test, 


P = (Area in left tail). 


Using Table 4 in Appendix B, the area corresponding to z = —2.23 is 0.0129, 
which is the area in the left tail. So, the P-value for a left-tailed hypothesis test 
with a standardized test statistic of z = —2.23 is P = 0.0129. You can check 
your answer using technology, as shown below. 


B 
4 | NORM. DIST(-2.23,0,1, TRUE). 


0.012873721 


Interpretation Because the P-value of 0.0129 is greater than 0.01, you fail to 
reject Hp. 


TRY IT YOURSELF 2 


Find the P-value for a left-tailed hypothesis test with a standardized test 
statistic of z = —1.71. Decide whether to reject Hp) when the level of 
significance is a = 0.05. 

Answer: Page A37 


Finding a P-Value for a Two-Tailed Test 
Find the P-value for a two-tailed hypothesis test with a standardized test 
statistic of z = 2.14. Decide whether to reject Hy when the level of significance 
isa = 0.05. 
SOLUTION 
The figure at the left shows the standard normal curve with shaded areas to the 
left of z = —2.14 and to the right of z = 2.14. For a two-tailed test, 

P = 2(Area in tail of standardized test statistic). 


Using Table 4, the area corresponding to z = 2.14 is 0.9838. The area in the 
right tail is 1 — 0.9838 = 0.0162. So, the P-value for a two-tailed hypothesis 
test with a standardized test statistic of z = 2.14 is 


P = 2(0.0162) = 0.0324. 


Interpretation Because the P-value of 0.0324 is less than 0.05, you reject Ho. 


TRY IT YOURSELF 3 


Find the P-value for a two-tailed hypothesis test with a standardized test 
statistic of z = 1.64. Decide whether to reject Hy when the level of significance 
isa = 0.10. 

Answer: Page A37 


SECTION 7.2 Hypothesis Testing for the Mean (a Known) 387 


Using P-Values for a z-Test 


You will now learn how to perform a hypothesis test for a mean pw assuming 
the standard deviation o is known. When a is known, you can use a z-test for 
the mean. To use the z-test, you need to find the standardized value for the test 
statistic x. The standardized test statistic takes the form of 


(Sample mean) — (Hypothesized mean ) 
Standard error ; 


z-Test for a Mean pw 


The z-test for a mean yp is a Statistical test for a population mean. The test 
statistic is the sample mean xX. The standardized test statistic is 


xX— py 
a/Vn 


when these conditions are met. 


; ao Standardized test statistic for w (o known) 


1. The sample is random. 


2. Atleast one of the following is true: The population is normally distributed 
orn = 30. 


Recall that 7 / Vn is the standard error of the mean, o. 


GUIDELINES 


Using P-Values for a z-Test for a Mean ps (o Known) 
In Words In Symbols 
. Verify that o is known, the 
sample is random, and either the 
population is normally distributed 
orn = 30. 


. State the claim mathematically State Hp and H,. 
and verbally. Identify the null 
and alternative hypotheses. 
. Specify the level of significance. Identify a. 
a fe 
o / Va 
. Find the area that corresponds to z. Use Table 4 in 
Appendix B. 


. Find the standardized test statistic. z 


. Find the P-value. 

a. For a left-tailed test, P = (Area in left tail). 

b. For a right-tailed test, P = (Area in right tail). 

c. For a two-tailed test, P = 2( Area in tail of standardized test statistic). 
. Make a decision to reject or fail If P = a, then reject Hp. 

to reject the null hypothesis. Otherwise, fail to reject Hp. 
. Interpret the decision in the 

context of the original claim. 


With all hypothesis tests, it is helpful to sketch the sampling distribution. 
Your sketch should include the standardized test statistic. 


388 CHAPTER 7 Hypothesis Testing with One Sample 


The area to the left 
of z= —2.98 is 
P=0.0014. 


Left-Tailed Test 


Hypothesis Testing Using a P-Value 


In auto racing, a pit stop is where a racing vehicle stops for new tires, fuel, 
repairs, and other mechanical adjustments. The efficiency of a pit crew that 
makes these adjustments can affect the outcome of a race. A pit crew claims 
that its mean pit stop time (for 4 new tires and fuel) is less than 13 seconds. A 
random sample of 32 pit stop times has a sample mean of 12.9 seconds. Assume 
the population standard deviation is 0.19 second. Is there enough evidence to 
support the claim at a = 0.01? Use a P-value. 


SOLUTION 


Because @ is known (oa = 0.19), the sample is random, and n = 32 = 30, you 
can use the z-test. The claim is “the mean pit stop time is less than 13 seconds.” 
So, the null and alternative hypotheses are 


Ay: w = 13 seconds and H,: w < 13 seconds. (Claim) 


The level of significance is a = 0.01. The standardized test statistic is 


x = 
Z= M Because o is known and n = 30, use the z-test. 
o y Vn 
12.9 — 13 
a aes Assume p = 13. 
0.19/32 
= —2.98. Round to two decimal places. 


Using Table 4 in Appendix B, the area corresponding to z = —2.98 is 0.0014. 
Because this test is a left-tailed test, the P-value is equal to the area to the left 
of z = —2.98, as shown in the figure at the left. So, P = 0.0014. Because the 
P-value is less than a = 0.01, you reject the null hypothesis. You can check your 
answer using technology, as shown below. Note that the P-value differs slightly 
from the one you found due to rounding. 


STATCRUNCH 


One sample Z hypothesis test: 
uy : Mean of population 

Inky 2 = Te) 

r,s tS V3) 

Standard deviation = 0.19 


Hypothesis test results: 


Mean n= Sample Mean Std. Err. Z-Stat P-value 
u ee (2.8) | O.Geeae/av72 | 287/728 7 |\O.001S 


Interpretation There is enough evidence at the 1% level of significance to 
support the claim that the mean pit stop time is less than 13 seconds. 


TRY IT YOURSELF 4 


Homeowners claim that the mean speed of automobiles traveling on their 
street is greater than the speed limit of 35 miles per hour. A random sample 
of 100 automobiles has a mean speed of 36 miles per hour. Assume the 
population standard deviation is 4 miles per hour. Is there enough evidence to 
support the claim at a = 0.05? Use a P-value. 

Answer: Page A37 


SECTION 7.2 Hypothesis Testing for the Mean (a Known) 389 


See Minitab steps 
on page 436. 


Hypothesis Testing Using a P-Value 


According to a study of U.S. homes that use heating equipment, the mean 
indoor temperature at night during winter is 68.3°F. You think this information 
is incorrect. You randomly select 25 U.S. homes that use heating equipment 
in the winter and find that the mean indoor temperature at night is 67.2°F. 
From past studies, the population standard deviation is known to be 3.5°F and 
the population is normally distributed. Is there enough evidence to support 
your claim at a = 0.05? Use a P-value. (Adapted from U.S. Energy Information 
Administration) 


SOLUTION 


Because o@ is known (ao = 3.5°F), the sample is random, and the population is 
normally distributed, you can use the z-test. The claim is “the mean is different 
from 68.3°F.” So, the null and alternative hypotheses are 


Ay: pw = 68.3°F and A, w A 68.3°F. (Claim) 


The level of significance is a = 0.05. The standardized test statistic is 


_ xX 7p Because o is known and the population 
a iy / Vn is normally distributed, use the z-test. 
67.2 — 68.3 
= oe ee Assume pu = 68.3°F. 
3.5/V25 

= —1.57. Round to two decimal places. 
In Table 4, the area corresponding to z = —1.57 is 0.0582. Because the test is a 
two-tailed test, the P-value is equal to twice the area to the left of z = —1.57, 


as shown in the figure. 


The area to the left of 
z= -—1.57 is 0.0582, so 
P= 2(0.0582) = 0.1164. 


t t aa z 
-3 - -1 0 1 2 3 


T 
z= -1.57 
Two-Tailed Test 


So, the P-value is P = 2(0.0582) = 0.1164. Because the P-value is greater than 
a = 0.05, you fail to reject the null hypothesis. 


Interpretation There is not enough evidence at the 5% level of significance 
to support the claim that the mean indoor temperature at night during winter 
is different from 68.3°F for U.S. homes that use heating equipment. 


TRY IT YOURSELF 5 


According to a study of employed U.S. adults ages 18 and over, the mean 
number of workdays missed due to illness or injury in the past 12 months 
is 3.5 days. You randomly select 25 employed U.S. adults ages 18 and over 
and find that the mean number of workdays missed is 4 days. Assume the 
population standard deviation is 1.5 days and the population is normally 
distributed. Is there enough evidence to doubt the study’s claim at a = 0.01? 
Use a P-value. (Adapted from U.S. National Center for Health Statistics) 

Answer: Page A37 


390 CHAPTER 7 Hypothesis Testing with One Sample 


Tech Tip 


Using a TI-84 Plus, you 
can either enter the 
original data into a list 
to find a P-value or 
enter the descriptive 
statistics. 


STAT 
Choose the TESTS menu. 
1: Z-Test... 


Select the Data input option when 
you use the original data. Select the 
Stats input option when you use 
the descriptive statistics. In each 
case, enter the appropriate values 
including the corresponding type 

of hypothesis test indicated by the 
alternative hypothesis. Then select 
Calculate. 


x 0 


Left-Tailed Test 


0 % 


Right-Tailed Test 


Two-Tailed Test 


Using Technology to Find a P-Value 


Use the TI-84 Plus displays to make a decision to reject or fail to reject the null 
hypothesis at a level of significance of a = 0.05. 


TI-84 PLUS TI-84 PLUS 


Z-Test 
Inpt:Data u+6.2 
Up:6.2 z=—2.013647416 
o:.47 p=.0440464253 
x:6.07 
nos 


H-BW) <Uo >Ho 
Calculate Draw 


SOLUTION 


The P-value for this test is 0.0440464253. Because the P-value is less than 
a = 0.05, you reject the null hypothesis. 


TRY IT YOURSELF 6 


Repeat Example 6 using a level of significance of a = 0.01. 
Answer: Page A37 


Rejection Regions and Critical Values 


Another method to decide whether to reject the null hypothesis is to determine 
whether the standardized test statistic falls within a range of values called the 
rejection region of the sampling distribution. 


DEFINITION 


A rejection region (or critical region) of the sampling distribution is the range 
of values for which the null hypothesis is not probable. If a standardized test 
statistic falls in this region, then the null hypothesis is rejected. A critical 
value z) separates the rejection region from the nonrejection region. 


GUIDELINES 


Finding Critical Values in the Standard Normal Distribution 
1. Specify the level of significance a. 
2. Determine whether the test is left-tailed, right-tailed, or two-tailed. 
3. Find the critical value(s) zo. When the hypothesis test is 
a. left-tailed, find the z-score that corresponds to an area of a. 
b. right-tailed, find the z-score that corresponds to an area of 1 — a. 


c. two-tailed, find the z-scores that correspond to $a and 1 — 5a. 


4. Sketch the standard normal distribution. Draw a vertical line at each 
critical value and shade the rejection region(s). (See the figures at the left.) 


Note that a standardized test statistic that falls in a rejection region is 
considered an unusual event. 


SECTION 7.2 Hypothesis Testing for the Mean (a Known) 391 


When you cannot find the exact area in Table 4, use the area that is closest. 
For an area that is exactly midway between two areas in the table, use the z-score 
midway between the corresponding z-scores. 


Finding a Critical Value for a Left-Tailed Test 
Find the critical value and rejection region for a left-tailed test with a = 0.01. 


SOLUTION 


The figure shows the standard normal 
curve with a shaded area of 0.01 in the left 
tail. In Table 4, the z-score that is closest 
to an area of 0.01 is —2.33. So, the critical 


value is 
=3 /=2. =1 0 di 2 3 
Z% = —2.33. Zy = 2.33 
The rejection region is to the left of this 1% Level of Significance 


critical value. You can check your answer 
using technology, as shown below. 


41 |NORM.S.INV(O.01) 
2 -2.32634787 


TRY IT YOURSELF 7 


Find the critical value and rejection region for a left-tailed test with a = 0.10. 
Answer: Page A37 


Because normal distributions are symmetric, in a two-tailed test the critical 
values are opposites, as shown in the next example. 


EXAMPLE 8 


Finding Critical Values for a Two-Tailed Test 
Find the critical values and rejection regions for a two-tailed test with a = 0.05. 


. Study Tip 
The table lists the critical 
values for commonly used 


levels of significance. 


SOLUTION 

The figure shows the standard normal 
curve with shaded areas of 3a = 0.025 in 
each tail. The area to the left of —Zzp is 


Alpha Tail z 


0.10 — Lef —1.28 Sa = 0.025, and the area to the left of zp 
Right ‘1.28 is 1 — 5a = 0.975. In Table 4, the z-scores 
Two | £1.645 that correspond to the areas 0.025 and 
0.05 Lef ~1.645 0.975 are —1.96 and 1.96, respectively. So, 
Right 1.645 the critical values are 
Tie +196 5% Level of Significance 
7? —z = —1.96 and z = 1.96. 
om = “— The rejection regions are to the left of 
ight 2.33 : 
Two | #2575 —1.96 and to the right of 1.96. 


TRY IT YOURSELF 8 


Find the critical values and rejection regions for a two-tailed test with a = 0.08. 
Answer: Page A37 


392 


CHAPTER 7 Hypothesis Testing with One Sample 


Using Rejection Regions for a z-Test 


To conclude a hypothesis test using rejection region(s), you make a decision and 
interpret the decision according to the next rule. 


Decision Rule Based on Rejection Region 


To use a rejection region to conduct a hypothesis test, calculate the standardized 
test statistic z. If the standardized test statistic 


1. is in the rejection region, then reject Hp. 
2. is not in the rejection region, then fail to reject Hp. 


Fail to reject Ho. Fail to reject Ho. 


Z<Zg: Reject Ho. 


Left-Tailed Test Right-Tailed Test 


Fail to reject Ho. 


Z<—Z,: Reject Hy. Z > Zy: Reject Ho. 


Two-Tailed Test 


Remember, failing to reject the null hypothesis does not mean that you have 
accepted the null hypothesis as true. It simply means that there is not enough 
evidence to reject the null hypothesis. 


GUIDELINES 


Using Rejection Regions for a z-Test for a Mean yt (o Known) 

In Words In Symbols 

. Verify that a is known, the 
sample is random, and either the 
population is normally distributed 
orn = 30. 

. State the claim mathematically State Hp and H,. 
and verbally. Identify the null 
and alternative hypotheses. 


. Specify the level of significance. Identify a. 


. Determine the critical value(s). Use Table 4 in Appendix B. 


. Determine the rejection region(s). 
_ 7H 
o / Vn 


. Make a decision to reject or fail to If z is in the rejection region, 
reject the null hypothesis. then reject Hp. Otherwise, 
fail to reject Ho. 


. Find the standardized test statistic Zz 
and sketch the sampling distribution. 


. Interpret the decision in the context 
of the original claim. 


ER 
we " Pieturing 
the World 


Each year, the Environmental 
Protection Agency (EPA) publishes 
reports of gas mileage for all 
makes and models of passenger 
vehicles. In a recent year, the small 
station wagon with an automatic 
transmission that posted the best 
mileage had a mean mileage of 
52 miles per gallon (city) and 

49 miles per gallon (highway). 

An auto manufacturer claims its 
station wagons exceed 49 miles 
per gallon on the highway. 

To support its claim, it tests 

36 vehicles on highway driving 
and obtains a sample mean of 
51.2 miles per gallon. Assume the 
population standard deviation is 
4.8 miles per gallon. (Source: U.S. 
Department of Energy) 


Is the evidence strong enough to 
support the claim that the station 
wagon’s highway miles per gallon 
exceeds the EPA estimate? Use a 
z-test with a = 0.01. 


SECTION 7.2 Hypothesis Testing for the Mean (a Known) 393 


See TI-84 Plus 
steps on page 437. 


Hypothesis Testing Using a Rejection Region 


Employees at a construction and mining company claim that the mean 
salary of the company’s mechanical engineers is less than that of one of its 
competitors, which is $88,200. A random sample of 20 of the company’s 
mechanical engineers has a mean salary of $85,900. Assume the population 
standard deviation is $9500 and the population is normally distributed. At 
a = 0.05, test the employees’ claim. 


SOLUTION 


Because o is known (o0 = $9500), the sample is random, and the population 
is normally distributed, you can use the z-test. The claim is “the mean salary 
is less than $88,200.” So, the null and alternative hypotheses can be written as 


Ho: w = $88,200 and Hi, w < $88,200. (Claim) 


Because the test is a left-tailed test and the level of significance is a = 0.05, 
the critical value is z = —1.645 and the rejection region is z < —1.645. The 
standardized test statistic is 


= xB Because a is known and the population 
Co J Vn is normally distributed, use the z-test. 
een Assume pz = $88,200. 
9500/20 
= —1.08. Round to two decimal places. 


The figure shows the location of the rejection region and the standardized test 
statistic z. Because z is not in the rejection region, you fail to reject the null 
hypothesis. 


1-—a=0.95 


=2 =1 0 1 2 
Z=-1.08 
Zp = 1.645 


5% Level of Significance 


Interpretation There is not enough evidence at the 5% level of significance 
to support the employees’ claim that the mean salary is less than $88,200. 


Be sure you understand the decision made in this example. Even though 
your sample has a mean of $85,900, you cannot (at a 5% level of significance) 
support the claim that the mean of all the mechanical engineers’ salaries is less 
than $88,200. The difference between your test statistic (x = $85,900) and the 
hypothesized mean (4 = $88,200) is probably due to sampling error. 


TRY IT YOURSELF 9 


The CEO of the company in Example 9 claims that the mean workday of the 
company’s mechanical engineers is less than 8.5 hours. A random sample of 
25 of the company’s mechanical engineers has a mean workday of 8.2 hours. 
Assume the population standard deviation is 0.5 hour and the population is 
normally distributed. At a = 0.01, test the CEO’s claim. 

Answer: Page A37 


394 CHAPTER 7 Hypothesis Testing with One Sample 


Hypothesis Testing Using Rejection Regions 


A researcher claims that the mean annual cost of raising a child (age 2 and 
under) by married-couple families in the U.S. is $14,050. In a random sample 
of married-couple families in the U.S., the mean annual cost of raising a child 
(age 2 and under) is $13,795. The sample consists of 500 children. Assume the 
population standard deviation is $2875. At a = 0.10, is there enough evidence 
to reject the claim? (Adapted from U.S. Department of Agriculture Center for 
Nutrition Policy and Promotion) 


SOLUTION 


Because o is known (a0 = $2875), the sample is random, and n = 500 = 30, 
you can use the z-test. The claim is “the mean annual cost is $14,050.” So, the 
null and alternative hypotheses are 


HA: w = $14,050 (Claim) and H,: w # $14,050. 


Because the test is a two-tailed test and the level of significance is a = 0.10, 
the critical values are —z) = —1.645 and z) = 1.645. The rejection regions are 
z < —1.645 and z > 1.645. The standardized test statistic is 


aS a Because o is known and n = 30, use the z-test. 
ae 
sae Assume uw = $14,050. 
2875 / V500 
= —1.98. Round to two decimal places. 


The figure shows the location of the rejection regions and the standardized test 
statistic z. Because z is in the rejection region, you reject the null hypothesis. 


1-—a=0.90 


10% Level of Significance 


You can check your answer using technology, as shown below. 


MINITAB 


One-Sample Z 


Test of py = 14050 vs # 14050 
The assumed standard deviation = 2875 


N Mean SE Mean 90% Cl P 
500 13795 129 (13584, 14006) 0.047 
Interpretation There is enough evidence at the 10% level of significance to 
reject the claim that the mean annual cost of raising a child (age 2 and under) 
by married-couple families in the U.S. is $14,050. 


TRY IT YOURSELF 10 


In Example 10, at a = 0.01, is there enough evidence to reject the claim? 
Answer: Page A37 


SECTION 7.2 Hypothesis Testing for the Mean (a Known) 395 


7.2 EXERCISES For Extra Hop: MyLab Sats 


Building Basic Skills and Vocabulary 


1. Explain the difference between the z-test for w using a P-value and the z-test 
for w using rejection region(s). 


2. In hypothesis testing, does using the critical value method or the P-value 
method affect your conclusion? Explain. 


Interpreting a P-Value Jn Exercises 3-8, the P-value for a hypothesis test is 
shown. Use the P-value to decide whether to reject Hy when the level of significance 
is (a) a = 0.01, (b) a = 0.05, and (c) a = 0.10. 


3. P = 0.0461 4. P = 0.0691 
5. P = 0.1271 6. P = 0.0107 
7. P = 0.0838 8. P = 0.0062 


Finding a P-Value Jn Exercises 9-14, find the P-value for the hypothesis test 
with the standardized test statistic z. Decide whether to reject Hy for the level of 
significance a. 


9. Left-tailed test 10. Left-tailed test 
z= —1.32 z= —1.55 
= 0.10 a = 0.05 
11. Right-tailed test 12. Right-tailed test 
z = 2.46 z= 1.23 
= 0.01 a = 0.10 
13. Two-tailed test 14. Two-tailed test 
z = —1.68 z= 1.95 
= 0.05 a = 0.08 


Graphical Analysis Jn Exercises 15 and 16, match each P-value with the 
graph that displays its area without performing any calculations. Explain 
your reasoning. 


15. P = 0.0089 and P = 0.3050 


(a) Lv \ 


At al a 4 
$2237 z=-0.51 


16. P = 0.0688 and P = 0.2802 


4AM LZ 


=3) <2) aif 230 22". =f 
eae 


396 


CHAPTER 7 Hypothesis Testing with One Sample 


In Exercises 17 and 18, use the TI-84 Plus displays to make a decision to reject or 
fail to reject the null hypothesis at the level of significance. 


17. a = 0.05 z=Test] 
Inet:Oata SEW 
BorBe 
gid. 25 


culate 


18. a = 0.01 


Inet: Oats 
Moire 
776d, 1 
a = 
nebo 
Bitio <eo 
Calculate 


HW Ws 


3 IT HE 


Finding Critical Values and Rejection Regions Jn Exercises 19-24, 
find the critical value(s) and rejection region(s) for the type of z-test with level of 
significance a. Include a graph with your answer. 


19. Left-tailed test, a = 0.03 20. Left-tailed test, a = 0.09 
21. Right-tailed test, a = 0.05 22. Right-tailed test, a = 0.08 
23. Two-tailed test, a = 0.02 24. Two-tailed test, a = 0.12 


Graphical Analysis Jn Exercises 25 and 26, state whether each standardized 
test statistic z allows you to reject the null hypothesis. Explain your reasoning. 


25. (a) z = —1.301 26. (a) z = 1.98 
(b) z = 1.203 (b) z = 1.89 
(c) z = 1.280 (c) z = 1.65 
(d) z = 1.286 (d) 2 = =1.99 


In Exercises 27-30, test the claim about the population mean y at the level of 
significance a. Assume the population is normally distributed. 
27. Claim: w = 40; a = 0.05; 0 = 1.97 
Sample statistics: ¥ = 39.2, n = 25 
28. Claim: w = 1475; a = 0.07; 0 = 29 
Sample statistics: ¥ = 1468, n = 26 
29. Claim: w # 5880; a = 0.03; 0 = 413 
Sample statistics: ¥ = 5771, n = 67 


30. Claim: w = 22,500; a = 0.01; 7 = 1200 
Sample statistics: ¥ = 23,500, n = 45 


SECTION 7.2 Hypothesis Testing for the Mean (a Known) 397 


Using and Interpreting Concepts 


Hypothesis Testing Using a P-Value Jn Exercises 31-36, 


(a) identify the claim and state Hy and H,. 


(b) find the standardized test statistic z. 


(c) find the corresponding P-value. 


(d) decide whether to reject or fail to reject the null hypothesis. 


(e) interpret the decision in the context of the original claim. 


31. 


32. 


33. 


34. 


MCAT Scores A random sample of 100 medical school applicants at a 
university has a mean total score of 502 on the MCAT. According to a 
report, the mean total score for the school’s applicants is more than 499. 
Assume the population standard deviation is 10.6. At a = 0.01, is there 
enough evidence to support the report’s claim? (Source: Association of 
American Medical Colleges) 


Smoke Detectors A manufacturer of smoke detectors designed for fire 
protection claims that the average area that the smoke detector covers is at 
least 60 square meters. To test this claim, you randomly select a sample of 
22 systems and find the mean coverage area to be 58 square meters. Assume 
the population standard deviation is 3.5 square meters. At a = 0.10, do you 
have enough evidence to reject the manufacturer’s claim? 


Boston Marathon A sports statistician claims that the mean winning 
times for Boston Marathon women’s open division champions is at least 
2.68 hours. The mean winning time of a sample of 30 randomly selected 
Boston Marathon women’s open division champions is 2.60 hours. Assume 
the population standard deviation is 0.32 hour. At a = 0.05, can you reject 
the claim? (Source: Boston Athletic Association) 


Acceleration Times A consumer group claims that the mean acceleration 
time from 0 to 60 miles per hour for a sedan is 6.3 seconds. A random sample 
of 33 sedans has a mean acceleration time from 0 to 60 miles per hour of 
7.2 seconds. Assume the population standard deviation is 2.5 seconds. At 
a = 0.05, can you reject the claim? (Source: Zero to 60 Times) 


eB 35. Roller Coasters The heights (in feet) of 36 randomly selected 


Q 


top-rated roller coasters are listed. Assume the population standard 
deviation is 71.6 feet. At a = 0.05, is there enough evidence to reject 
the claim that the mean height of top-rated roller coasters is 160 feet? 
(Source: POP World Media, LLC) 


325 188 306 107 208 167 105 78 140 
232 230 170 170 205 305 135 200 200 
100 223 135 195 80 90 120 210 82 
161 245 88 70 116 121 146 149 124 


36. Salaries An analyst claims that the mean annual salary for 
intermediate level architects in Wichita, Kansas, is more than the 
national mean, $52,000. The annual salaries (in dollars) for a random 
sample of 21 intermediate level architects in Wichita are listed. Assume 
the population is normally distributed and the population standard 
deviation is $8000. At a = 0.09, is there enough evidence to support the 
analyst’s claim? (Adapted from Salary.com) 


47,066 58,955 59,774 56,016 52,487 41,258 43,806 
44,291 44,063 44,365 40,120 49,853 50,233 43,827 
56,085 48,967 57,983 60,295 57,776 46,500 47,658 


398 CHAPTER 7 Hypothesis Testing with One Sample 


Carbon dioxide emissions 


(in megatons) 


TABLE FOR EXERCISE 42 


Hypothesis Testing Using Rejection Region(s) Jn Exercises 37-42, 
(a) identify the claim and state Hy and H,,, (b) find the critical value(s) and identify 
the rejection region(s), (c) find the standardized test statistic z, (d) decide whether 
to reject or fail to reject the null hypothesis, and (e) interpret the decision in the 
context of the original claim. 


37. 


38. 


39. 


Caffeine Content A consumer research organization states that the mean 
caffeine content per 12-ounce bottle of a population of caffeinated soft 
drinks is 37.7 milligrams. You want to test this claim. During your tests, 
you find that a random sample of thirty-six 12-ounce bottles of caffeinated 
soft drinks has a mean caffeine content of 36.4 milligrams. Assume the 
population standard deviation is 10.8 milligrams. At a = 0.01, can you reject 
the research organization’s claim? (Source: National Soft Drink Association) 


High School Graduation Rate An education researcher claims that the 
mean high school graduation rate per state in the United States is 80%. You 
want to test this claim. You find that a random sample of 30 states has a 
mean high school graduation rate of 82%. Assume the population standard 
deviation is 5.1%. At a = 0.05, do you have enough evidence to support the 
researcher’s claim? (Source: U.S. Department of Education) 


Cookies A cookie manufacturer claims that the mean sugar content in 
each of the cookies produced is no more than 18%. A random sample 
of 56 cookies has a mean sugar content of 19%. Assume the population 
standard deviation is 4%. At a = 0.02, do you have enough evidence to 
reject the manufacturer’s claim? 


. LED Lamps An LED lamp manufacturer guarantees that the mean life 


of a certain type of LED lamp is at least 25,000 hours. A random sample 
of 49 LED lamps has a mean life of 24,800 hours. Assume the population is 
normally distributed and the population standard deviation is 500 hours. At 
a = 0.05, do you have enough evidence to reject the manufacturer’s claim? 


eB 41. Fluorescent Lamps A fluorescent lamp manufacturer guarantees that 


the mean life of a fluorescent lamp is at least 10,000 hours. You want to 
test this guarantee. To do so, you record the lives of a random sample 
of 32 fluorescent lamps. The results (in hours) are listed. Assume the 
population standard deviation is 1850 hours. At a = 0.11, do you have 
enough evidence to reject the manufacturer’s claim? 


8,800 9,155 13,001 10,250 10,002 11,413 8,234 10,402 
10,016 8,015 6,110 11,005 11,555 9,254 6,991 12,006 
10,420 8,302 8,151 10,980 10,186 10,003 8,814 11,445 

6,277 8,632 7,265 10,584 9,397 11,987 7,556 10,380 


eB 42. Carbon Dioxide Emissions A scientist estimates that the mean 


carbon dioxide emissions per country in a recent year are greater than 
150 megatons. You want to test this estimate. To do so, you determine 
the carbon dioxide emissions for 42 randomly selected countries for that 
year. The results (in megatons) are shown in the table at the left. Assume 
the population standard deviation is 816 megatons. At a = 0.06, can 
you support the scientist’s estimate? (Source: Global Carbon Project) 


Extending Concepts 


43. 


44, 


Writing When P > a, does the standardized test statistic lie inside or 
outside of the rejection region(s)? Explain your reasoning. 


Writing In a right-tailed test where P < a, does the standardized test 
statistic lie to the left or the right of the critical value? Explain your reasoning. 


Hypothesis Testing for the Mean (o- Unknown) 


What You Should Learn 


» How to find critical values in a 


t-distribution 


» How to use the ftest to test a 


mean p when oa is not known 


~ How to use technology to find 


P-values and use them with a 
t-test to test a mean » when a 
is not known 


SECTION 7.3 Hypothesis Testing for the Mean (a Unknown) 399 


Critical Values in a t-Distribution m= The t-Test for a Mean w = Using 
P-Values with t-Tests 


Critical Values in a Distribution 


In Section 7.2, you learned how to perform a hypothesis test for a population 
mean when the population standard deviation is known. In many real-life 
situations, the population standard deviation in not known. When either the 
population has a normal distribution or the sample size is at least 30, you can still 
test the population mean yp. To do so, you can use the ¢-distribution with n — 1 
degrees of freedom. 


GUIDELINES 


Finding Critical Values in a ¢-Distribution 
1. Specify the level of significance a. 
2. Identify the degrees of freedom, d.f. = n — 1. 


3. Find the critical value(s) using Table 5 in Appendix B in the row with 
n — 1 degrees of freedom. When the hypothesis test is 


a. left-tailed, use the “One Tail, a” column with a negative sign. 
b. right-tailed, use the “One Tail, a” column with a positive sign. 


c. two-tailed, use the “Two Tails, a” column with a negative and a 
positive sign. 
See the figures below. 


Left-Tailed Test 


5% Level of Significance 


Right-Tailed Test Two-Tailed Test 


Finding a Critical Value for a Left-Tailed Test 
Find the critical value fg for a left-tailed test with a = 0.05 and n = 21. 


SOLUTION 
The degrees of freedom are 
df.=n-—1=21-1=20. 


To find the critical value, use Table 5 in Appendix B with d.f. = 20 and 
a = 0.05 in the “One Tail, a” column. Because the test is left-tailed, the critical 
value is negative. So, fg = —1.725, as shown in the figure at the left. 


TRY IT YOURSELF 1 


Find the critical value fp for a left-tailed test with a = 0.01 andn = 14. 
Answer: Page A37 


400 CHAPTER 7 Hypothesis Testing with One Sample 


aul r i Lites. 
-4 3 -2[ 1 0 1/2 3 4 
~fg=-1.708 fy = 1.708 


10% Level of Significance 


Finding a Critical Value for a Right-Tailed Test 
Find the critical value fp for a right-tailed test with a = 0.01 andn = 17. 


SOLUTION 
The degrees of freedom are 
df.=n-1 
=17-1 
= 16. 


To find the critical value, use Table 5 with 
d.f. = 16 and a = 0.01 in the “One Tail, a” 
column. Because the test is right-tailed, the 
critical value is positive. So, 


ee a a 2 | 3 4 
to = 2.583 ty = 2.583 
as shown in the figure. 1% Level of Significance 


TRY IT YOURSELF 2 


Find the critical value fp for a right-tailed test with a = 0.10 andn = 9. 
Answer: Page A37 


Because f-distributions are symmetric, in a two-tailed test the critical values 


are opposites, as shown in the next example. 


Finding Critical Values for a Two-Tailed Test 
Find the critical values —f9 and fp for a two-tailed test with a = 0.10 andn = 26. 


SOLUTION 
The degrees of freedom are 
df.=n-1 
=26-1 
= 25. 


To find the critical values, use Table 5 with d.f. = 25 and a = 0.10 in the “Two 
Tails, a” column. Because the test is two-tailed, one critical value is negative 
and one is positive. So, 


—ty = —1.708 and to = 1.708 


as shown in the figure at the left. You can check your answer using technology, 
as shown below. 


1 | T.INV.2T(0. 1,25) 
2 1.708140761 
TRY IT YOURSELF 3 


Find the critical values —fo and fp for a two-tailed test with a = 0.05 andn = 16. 
Answer: Page A37 


>) Picturing 
the World 


Exposure to lead may cause 
health problems ranging from 
stomach distress to brain damage. 
The Environmental Protection 
Agency established rules that 
require water systems to monitor 
drinking water at customer 

taps. If lead concentrations 
exceed 0.015 milligram per liter 
in more than 10% of customer 
taps sampled, the system must 
undertake a number of actions, 
such as source water treatment, 
public education, and lead 
service line replacement. On the 
basis of a t-test, a water system 
makes a decision on whether the 
mean level of lead in the water 
exceeds the allowable amount of 
0.015 milligram per liter. Assume 
the null hypothesis is w = 0.015. 
(Source: Environmental Protection Agency) 


Hy, True H, False 


Fail to 
reject Hp. 


Reject Ho. 


Describe the possible type | and 
type Il errors of this situation. 


SECTION 7.3 Hypothesis Testing for the Mean (a Unknown) 401 


The #Test for a Mean pw 


To test a claim about a mean yw when a is not known, you can use a f-sampling 
distribution. The standardized test statistic takes the form of 


(Sample mean) — (Hypothesized mean ) 


Standard error 


Because o is not known, the standardized test statistic is calculated using the 
sample standard deviation s, as shown in the next definition. 


t-Test for a Mean ju 


The ¢-test for a mean yp is a Statistical test for a population mean. The test 
statistic is the sample mean x. The standardized test statistic is 


ath 
 s/Va 


when these conditions are met. 


t Standardized test statistic for ~ (o unknown) 


1. The sample is random. 


2. At least one of the following is true: The population is normally distributed 
orn = 30. 


The degrees of freedom are d.f. = n — 1. 


GUIDELINES 


Using the ¢-Test for a Mean px (o Unknown) 
In Words In Symbols 


1. Verify that o is not known, the 
sample is random, and either the 
population is normally distributed 
orn = 30. 


State the claim mathematically State Hp and H,. 

and verbally. Identify the null 

and alternative hypotheses. 

Specify the level of significance. Identify a. 

Identify the degrees of freedom. df.=n-1 

Determine the critical value(s). Use Table 5 in Appendix B. 


Determine the rejection region(s). 

x = 
Find the standardized test statistic t= ee 
and sketch the sampling s/Vn 


distribution. 

Make a decision to reject or fail to If ¢ is in the rejection region, 

reject the null hypothesis. then reject Hp. Otherwise, 
fail to reject Hp. 

Interpret the decision in the context 

of the original claim. 


In Step 8 of the guidelines, the decision rule uses rejection regions. You can 
also test a claim using P-values, as shown on page 404. Also, when the number 
of degrees of freedom you need is not in Table 5, use the closest number in the 
table that is less than the value you need (or use technology). For instance, for 
d.f. = 57, use 50 degrees of freedom. 


402 


CHAPTER 7 Hypothesis Testing with One Sample 


To explore this topic further, 
see Activity 73 on page 408. 


See Minitab steps 
on page 436. 


Hypothesis Testing Using a Rejection Region 

A used car dealer says that the mean price of used cars sold in the last 
12 months is at least $21,000. You suspect this claim is incorrect and find 
that a random sample of 14 used cars sold in the last 12 months has a mean 
price of $19,189 and a standard deviation of $2950. Is there enough evidence 
to reject the dealer’s claim at a = 0.05? Assume the population is normally 
distributed. (Adapted from Edmunds.com) 

SOLUTION 


Because o is unknown, the sample is random, and the population is normally 
distributed, you can use the ftest. The claim is “the mean price is at least 
$21,000.” So, the null and alternative hypotheses are 


Ho: w = $21,000 (Claim) 
and 
Hy: w < $21,000. 


The test is a left-tailed test, the level of significance is a = 0.05, and the 
degrees of freedom are 


df. = 14-1 = 13. 


So, using Table 5, the critical value is tg = —1.771. The rejection region is 
t < —1.771. The standardized test statistic is 


XK Because o is unknown and the population 
s i Vn is normally distributed, use the t-test. 


— 19,189 — 21,000 


2950/V/14 


a = 2,297. Round to three decimal places. 


Assume p = 21,000. 


The figure shows the location of the 
rejection region and the standardized test 
statistic t. Because ¢ is in the rejection 
region, you reject the null hypothesis. 


Interpretation There is enough evidence 
at the 5% level of significance to reject 
the claim that the mean price of used 
cars sold in the last 12 months is at least ¢=~2.297 t)=-1L771 

$21,000. 5% Level of Significance 


TRY IT YOURSELF 4 


An industry analyst says that the mean age of a used car sold in the last 
12 months is less than 4.1 years. A random sample of 25 used cars sold in 
the last 12 months has a mean age of 3.7 years and a standard deviation of 
1.3 years. Is there enough evidence to support the analyst’s claim at a = 0.10? 
Assume the population is normally distributed. (Adapted from Edmunds.com) 
Answer: Page A37 


Remember that when you make a decision, the possibility of a type I or a 


type II error exists. For instance, in Example 4, a type I error is possible when 
you reject Hy, because w = $21,000 may be true. 


SECTION 7.3 Hypothesis Testing for the Mean (a@ Unknown) 403 


See TI-84 Plus 
steps on page 437. 


Hypothesis Testing Using Rejection Regions 


An industrial company claims that the mean pH level of the water in a nearby 
river is 6.8. You randomly select 39 water samples and measure the pH of each. 
The sample mean and standard deviation are 6.7 and 0.35, respectively. Is there 
enough evidence to reject the company’s claim at a = 0.05? 


SOLUTION 


Because o is unknown, the sample is random, and n = 39 = 30, you can use 
the t-test. The claim is “the mean pH level is 6.8.” So, the null and alternative 
hypotheses are 


Ho: w = 6.8 (Claim) and Ay: w A 68. 


The test is a two-tailed test, the level of significance is a = 0.05, and the 
degrees of freedom are d.f. = 39 — 1 = 38. So, using Table 5, the critical 
values are —f) = —2.024 and tg = 2.024. The rejection regions are t < —2.024 
and t > 2.024. The standardized test statistic is 


t= : Wi Because o is unknown and n = 30, use the t-test. 
= e788. Assume p = 6.8. 
0.35 /V39 
= —1.784. Round to three decimal places. 


The figure shows the location 
of the rejection regions and 
the standardized test statistic 
t. Because f¢ is not in the 
rejection region, you fail to 
reject the null hypothesis. 5% = 0,025 
You can confirm _ this 
decision using technology, as 
shown below. Note that the 
standardized statistic ¢ differs 
from the one found using 5% Level of Significance 
Table 5 due to rounding. 


MINITAB 


One-Sample T 
Test of u = 6.8 vs 4 6.8 


N Mean StDev SE Mean 95% Cl P 
39 6.7000 0.3500 0.0560 (6.5865, 6.8135) 0.082 


Interpretation There is not enough evidence at the 5% level of significance 
to reject the claim that the mean pH level is 6.8. 


TRY IT YOURSELF 5 


The company in Example 5 claims that the mean conductivity of the river is 
1890 milligrams per liter. The conductivity of a water sample is a measure of 
the total dissolved solids in the sample. You randomly select 39 water samples 
and measure the conductivity of each. The sample mean and standard deviation 
are 2350 milligrams per liter and 900 milligrams per liter, respectively. Is there 
enough evidence to reject the company’s claim at a = 0.01? 

Answer: Page A37 


= 
~t)=-2.024 t= -1.784 fy = 2.024 


404 CHAPTER 7 Hypothesis Testing with One Sample 


Tech Tip 


Using a TI-84 Plus, you 
can either enter the 
original data into a list 
to find a P-value or 
3 enter the descriptive 
* statistics. 


STAT 
Choose the TESTS menu. 
2: FTest... 


Select the Data input option when 
you use the original data. Select the 
Stats input option when you use 
the descriptive statistics. In each 
case, enter the appropriate values, 
including the corresponding type 

of hypothesis test indicated by the 
alternative hypothesis. Then select 
Calculate. 


TI-84 PLUS 


T-Test 
Inpt:Data 
Hg:14 
re 
Sess 
n:10 


H:4#Ug >Ho 
Calculate Draw 


Using P-Values With #Tests 


You can also use P-values for a t-test for a mean w. For instance, consider finding 
a P-value given t = 1.98, 15 degrees of freedom, and a right-tailed test. Using 
Table 5 in Appendix B, you can determine that P falls between 


a = 0.025 and a = 0.05 


but you cannot determine an exact value for P. In such cases, you can use 
technology to perform a hypothesis test and find exact P-values. 


Using P-Values with a f-Test 

A department of motor vehicles office claims that the mean wait time is less 
than 14 minutes. A random sample of 10 people has a mean wait time of 
13 minutes with a standard deviation of 3.5 minutes. At a = 0.10, test the 
office’s claim. Assume the population is normally distributed. 

SOLUTION 


Because o is unknown, the sample is random, and the population is normally 
distributed, you can use the t-test. The claim is “the mean wait time is less than 
14 minutes.” So, the null and alternative hypotheses are 


Ay: w = 14 minutes 
and 
A: w < 14 minutes. (Claim) 


The TI-84 Plus display at the far left shows how to set up the hypothesis test. 
The two displays on the right show the possible results, depending on whether 
you select Calculate or Draw. 


TI-84 PLUS TI-84 PLUS 


yu<14 
=—.9035079029 


p=.1948994027 


From the displays, you can see that 
P = 0.1949. 


Because the P-value is greater than a = 0.10, you fail to reject the null 
hypothesis. 


Interpretation There is not enough evidence at the 10% level of significance 
to support the office’s claim that the mean wait time is less than 14 minutes. 


TRY IT YOURSELF 6 


Another department of motor vehicles office claims that the mean wait time 
is at most 18 minutes. A random sample of 12 people has a mean wait time 
of 15 minutes with a standard deviation of 2.2 minutes. At a = 0.05, test the 
office’s claim. Assume the population is normally distributed. 

Answer: Page A37 


7.3 EXERCISES 


SECTION 7.3 Hypothesis Testing for the Mean (a Unknown) 405 


For Extra Help: MyLab Statistics 


Building Basic Skills and Vocabulary 
1. Explain how to find critical values for a ¢-distribution. 


2. Explain how to use a f-test to test a hypothesized mean yz when o is unknown. 
What assumptions are necessary? 


In Exercises 3-8, find the critical value(s) and rejection region(s) for the type of 
t-test with level of significance a and sample size n. 


3. Left-tailed test, a = 0.10,n = 20 4, Left-tailed test, a = 0.01,n = 35 
5. Right-tailed test, a = 0.05,n = 23 6. Right-tailed test,a = 0.01,n = 31 
7. Two-tailed test, a = 0.05,n = 27 8. Two-tailed test, a = 0.10,n = 38 


Graphical Analysis Jn Exercises 9-12, state whether each standardized test 
statistic t allows you to reject the null hypothesis. Explain. 


9. (a) t = 2.091 10. (a) t= 1.4 
(b) t=0 (b) ¢ = 1.42 
(c) t = —2.096 (c) t= —1.402 
4 t t 
-4 -3 -1 012 3 4 -4-3-2-1 0 1/2 3 4 
ty = —2.086 to = 1.402 


1. (a) ¢ = -1.755 
(b) ¢t = -1.585 
(c) t = 1.745 


t 
=4--3=2\4 0 1/2 3.4 
Sh 17S 41.795 


12. (a) t= -11 
(b) tf = 1.01 
(c) f= 1.7 


-4 4A 0 Ne 3 4 


ty =-1.071 fy = 1.071 


In Exercises 13-18, test the claim about the population mean y at the level of 
significance a. Assume the population is normally distributed. 


13. Claim: w = 15; a = 0.01. Sample statistics: ¥ = 13.9, 5 = 3.23,n = 36 
14. Claim: w > 25; a = 0.05. Sample statistics: ¥ = 26.2, s = 2.32,n = 17 


15. Claim: yw 


V 


A 


8000; a = 0.01. Sample statistics: ¥ = 7700, s = 450,n = 25 


16. Claim: w = 1600; a = 0.02. Sample statistics: ¥ = 1550, 5 = 165,n = 46 
17. Claim: w < 4915; a = 0.02. Sample statistics: x = 5017, s = 5613, = 51 
18. Claim: w ¥ 52,200; a = 0.05. Sample statistics: ¥ = 53,220, s = 2700, n = 34 


406 CHAPTER 7 Hypothesis Testing with One Sample 


Annual salaries 


100,651 82,505 102,450 91,091 
96,309 74,193 76,184 82,088 
93,551 77,012 104,020 85,063 

112,717 80,970 103,982 110,316 


TABLE FOR EXERCISE 25 


Annual salaries 


89,245 86,013 83,151 69,771 
87,834 67,964 76,523 90,268 
90,440 93,538 76,999 68,257 


TABLE FOR EXERCISE 26 


Using and Interpreting Concepts 


Hypothesis Testing Using Rejection Regions In Exercises 19-26, 
(a) identify the claim and state Hp and H,, (b) find the critical value(s) and identify the 
rejection region(s), (c) find the standardized test statistic t, (d) decide whether to reject 
or fail to reject the null hypothesis, and (e) interpret the decision in the context of the 
original claim. Assume the population is normally distributed. 


19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


Warehouse Rent An estate agent says that the mean rental of a warehouse 
(with basic amenities) is $2,500. You suspect this claim is incorrect and find 
that a random sample of 38 such warehouses has a mean rental of $2,850 and a 
standard deviation of $700. Is there enough evidence to reject the claim at 
a = 0.01? 


Customer Care Wait Times A call center claims that the mean wait 
time for a customer to connect with a customer care executive is at most 
2.5 minutes. A random sample of 35 calls at the call center has a mean 
wait time of 2.75 minutes and a standard deviation of 1.25 minutes. Is there 
enough evidence to reject the claim at a = 0.05? 


Credit Card Debt A credit reporting agency claims that the mean credit 
card debt by state is greater than $5500 per person. You want to test this 
claim. You find that a random sample of 30 states has a mean credit card 
debt of $5594 per person and a standard deviation of $597 per person. At 
a = 0.05, can you support the claim? (Adapted from TransUnion) 


Flash Drive Cycles A company claims that the mean number of usage 
cycles of their flash drives is at least 25,000 hours. You suspect this claim 
is incorrect and find that a random sample of 15 flash drives has a mean 
number of 24,500 usage cycles and a standard deviation of 750 usage cycles. 
Is there enough evidence to reject the claim at a = 0.01? 


Carbon Monoxide Levels As part of your work for an environmental 
awareness group, you want to test a claim that the mean amount of carbon 
monoxide in the air in U.S. cities is less than 2.34 parts per million. You find 
that the mean amount of carbon monoxide in the air for a random sample 
of 64 USS. cities is 2.37 parts per million and the standard deviation is 2.11 
parts per million. At a = 0.10, can you support the claim? (Adapied from U.S. 
Environmental Protection Agency) 


Lead Levels As part of your work for an environmental awareness group, 
you want to test a claim that the mean amount of lead in the air in U.S. cities 
is less than 0.036 microgram per cubic meter. You find that the mean amount 
of lead in the air for a random sample of 56 U.S. cities is 0.039 microgram per 
cubic meter and the standard deviation is 0.069 microgram per cubic meter. 
At a = 0.01, can you support the claim? (Adapted from U.S. Environmental 
Protection Agency) 


Annual Salary An employment information service claims the mean 
annual salary for senior level product engineers is $98,000. The annual 
salaries (in dollars) for a random sample of 16 senior level product engineers 
are shown in the table at the left. At a = 0.05, test the claim that the mean 
salary is $98,000. (Adapted from Salary.com) 


Annual Salary Anemployment information service claims the mean annual 
salary for home care physical therapists is more than $80,000. The annual 
salaries (in dollars) for a random sample of 12 home care physical therapists 
are shown in the table at the left. At a = 0.10, is there enough evidence to 
support the claim that the mean salary is more than $80,000? (Adapted from 
Salary.com) 


Number of deliveries per day 


25 23 26 28 32 
23. 21-, 3. 25. 29 
31 27 28 24 35 


TABLE FOR EXERCISE 29 


Delivery hours 


5.4 5.5 63 6.2 
49 5.6 61 6.2 


TABLE FOR EXERCISE 30 


29 
26 
30 


SECTION 7.3 Hypothesis Testing for the Mean (a Unknown) 407 


Using a P-Value with a ¢t-Test In Exercises 27-30, (a) identify the claim 
and state Hy and H,, (b) use technology to find the P-value, (c) decide whether 
to reject or fail to reject the null hypothesis, and (d) interpret the decision in the 
context of the original claim. Assume the population is normally distributed. 


27. Quarter Mile Times A consumer group claims that the mean minimum 
time it takes for a sedan to travel a quarter mile is greater than 14.7 seconds. 
A random sample of 22 sedans has a mean minimum time to travel a quarter 
mile of 15.4 seconds and a standard deviation of 2.10 seconds. At a = 0.10, 
do you have enough evidence to support the consumer group’s claim? 
(Adapted from Zero to 60 Times) 


28. Dive Duration An oceanographer claims that the mean dive duration of 
a North Atlantic right whale is 11.5 minutes. A random sample of 34 dive 
durations has a mean of 12.2 minutes and a standard deviation of 2.2 minutes. 
Is there enough evidence to reject the claim at a = 0.10? (Source: Marine 
Ecology Progress Series) 


29. Courier Deliveries You receive a brochure from a courier company. The 
brochure indicates that the mean number of deliveries per deliveryman is 
more than 28 packets per day. You want to test this claim. You randomly 
select 18 deliverymen and determine the number of deliveries of each person 
per day. The results are shown in the table at the left. At a = 0.05, can you 
support the company’s claim? 


30. Delivery Hours The director of a courier company estimates that the mean 
time spent by a deliveryman in dropping off packages per day is 6.0 hours. As 
a member of the labor union, you want to test this claim. A random sample 
of the number of hours for eight deliverymen for a day is shown in the table 
at the left. At a = 0.01, can you reject the director’s claim? 


Extending Concepts 


Deciding on a Distribution Jn Exercises 31 and 32, decide whether you 
should use the standard normal sampling distribution or a t-sampling distribution 
to perform the hypothesis test. Justify your decision. Then use the distribution to 
test the claim. Write a short paragraph about the results of the test and what you 
can conclude about the claim. 


31. Gas Mileage A car company claims that the mean gas mileage for its 
luxury sedan is at least 23 miles per gallon. You believe the claim is 
incorrect and find that a random sample of 5 cars has a mean gas mileage 
of 22 miles per gallon and a standard deviation of 4 miles per gallon. 
At a= 0.05, test the company’s claim. Assume the population is 
normally distributed. 


32. Tuition and Fees An education publication claims that the mean in-state 
tuition and fees at public four-year institutions by state is more than 
$9000 per year. A random sample of 30 states has a mean in-state tuition 
and fees at public four-year institutions of $9231 per year. Assume the 
population standard deviation is $2380. At a = 0.01, test the publication’s 
claim. (Adapted from The College Board) 


33. Writing You are testing a claim and incorrectly use the standard normal 
sampling distribution instead of the ¢-sampling distribution. Does this make 
it more or less likely to reject the null hypothesis? Is this result the same 
no matter whether the test is left-tailed, right-tailed, or two-tailed? Explain 
your reasoning. 


ACTIVITY 


Hypothesis ests for a Mean 


APPLET 


You can find the interactive 
applet for this activity 
within MyLab Statistics or at 
www.pearsonglobaleditions 
.com. 


LY 


APPLET 


The hypothesis tests for a mean applet allows you to visually investigate hypothesis 
tests for a mean. You can specify the sample size n, the shape of the distribution 
(Normal or Right skewed), the true population mean (Mean), the true population 
standard deviation (Std. Dev.), the null value for the mean (Null mean), and the 
alternative for the test (Alternative). When you click SIMULATE, 100 separate 
samples of size n will be selected from a population with these population 
parameters. For each of the 100 samples, a hypothesis test based on the T statistic 
is performed, and the results from each test are displayed in the plots at the right. 
The test statistic for each test is shown in the top plot and the P-value is shown 
in the bottom plot. The green and blue lines represent the cutoffs for rejecting 
the null hypothesis with the 0.05 and 0.01 level tests, respectively. Additional 
simulations can be carried out by clicking SIMULATE multiple times. The 
cumulative number of times that each test rejects the null hypothesis is also 
shown. Press CLEAR to clear existing results and start a new simulation. 


Distribution: [Normal [Vv] 
Step 1 Specify a value for n. Meant lo 
Step 2 Specify a distribution. Std. Dev:| 10 
Step 3. Specify a value for the mean. Null mean:| 50 
Step 4 Specify a value for the Alternative: |< ) 
standard deviation. imulate | 
Step 5 Specify a value for the = 
null mean. Cumulative results: 
Step 6 Specify an alternative Skee Gini 
hypothesis. 


Reject null 


Step 7 Click SIMULATE to 
generate the hypothesis tests. 


Fail to reject null 


Prop. rejected 


Clear | 


DRAW CONCLUSIONS 


1. Set n = 15, Mean = 40, Std. Dev. = 5, and the distribution to “Normal.” 
Test the claim that the mean is equal to 40. What are the null and alternative 
hypotheses? Run the simulation so that at least 1000 hypothesis tests are run. 
Compare the proportion of null hypothesis rejections for the 0.05 level and the 
0.01 level. Is this what you would expect? Explain. 


2. Suppose a null hypothesis is rejected at the 0.01 level. Will it be rejected at the 
0.05 level? Explain. Suppose a null hypothesis is rejected at the 0.05 level. Will 
it be rejected at the 0.01 level? Explain. 


3. Set n = 25, Mean = 25, Std. Dev. = 3, and the distribution to “Normal.” 
Test the claim that the mean is at least 27. What are the null and alternative 
hypotheses? Run the simulation so that at least 1000 hypothesis tests are run. 
Compare the proportion of null hypothesis rejections for the 0.05 level and the 
0.01 level. Is this what you would expect? Explain. 


408 CHAPTER 7 Hypothesis Testing with One Sample 


In an article in the Journal of Statistics Education 
(vol. 4, no. 2), Allen Shoemaker describes a study that 
was reported in the Journal of the American Medical 
Association (JAMA).* It is generally accepted that 
the mean body temperature of an adult human is 
98.6°F. In his article, Shoemaker uses the data from 
the JAMA article to test this hypothesis. Here is a 
summary of his test. 


Claim: The body temperature of adults is 98.6°F. 
Ay: w = 98.6°F (Claim) Ay: wp A 98.6°F 

Sample Size: 1 = 130 
Population: Adult human temperatures (Fahrenheit) 
Distribution: Approximately normal 
Test Statistics: x ~ 98.25,s ~ 0.73 
* Data for the JAMA article were collected from 

healthy men and women, ages 18 to 40, at the 


University of Maryland Center for Vaccine 
Development, Baltimore. 


EXERCISES 


1. Complete the hypothesis test for all adults (men 
and women) by performing the following steps. 
Use a level of significance of a = 0.05. 


(a) Sketch the sampling distribution. 

(b) Determine the critical values and add them 
to your sketch. 

(c) Determine the rejection regions and shade 
them in your sketch. 

(d) Find the standardized test statistic. Plot and 
label it in your sketch. 

(e) Make a decision to reject or fail to reject the 
null hypothesis. 

(f) Interpret the decision in the context of the 
original claim. 


Human Body Temperature: What's Normal? 


Men’s Temperatures 


(in degrees Fahrenheit) 
96 | 3 
96 | 79 


97} 0111234444 

97 | 556667888899 

98 | 000000112222334444 
98 |55666666778889 

99 |} 0001234 


99 | 5 
100 
100 Key: 96|3 = 96.3 
Women’s Temperatures 
(in degrees Fahrenheit) 
96 | 4 
96 | 78 
97 | 224 


97 | 677888999 

98 | 00000122222233344444 
98 |5666677777788888889 
99 |} 00112234 


99 | 9 
100 | 0 
100 | 8 Key: 96|4 = 96.4 


. If you lower the level of significance to 


a = 0.01, does your decision change? Explain 
your reasoning. 


. Test the hypothesis that the mean temperature 


of men is 98.6°F. What can you conclude at a 
level of significance of a = 0.01? 


. Test the hypothesis that the mean temperature 


of women is 98.6°F. What can you conclude at a 
level of significance of a = 0.01? 


. Use the sample of 130 temperatures to form 


a 99% confidence interval for the mean body 
temperature of adult humans. 


. The conventional “normal” body temperature 


was established by Carl Wunderlich over 
100 years ago. What were possible sources of 
error in Wunderlich’s sampling procedure? 


Case Study 409 


410 


What You Should Learn 


~ How to use the z-test to test a 
population proportion p 


Study Tip 


A hypothesis test for a 
proportion p can also be 
performed using P-values. 
Use the guidelines on 

page 387 for using P-values 
for a z-test for a mean pu, 
but in Step 4 find the standardized 
test statistic by using the formula 


p=p 
Vpq/n 


The other steps in the test are the 
same. 


ACS Hypothesis Testing for Proportions 


CHAPTER 7 Hypothesis Testing with One Sample 


Hypothesis Test for Proportions 


Hypothesis Test for Proportions 


In Sections 7.2 and 7.3, you learned how to perform a hypothesis test for a 
population mean yw. In this section, you will learn how to test a population 
proportion p. 

Hypothesis tests for proportions can be used when politicians want to know 
the proportion of their constituents who favor a certain bill or when quality 
assurance engineers test the proportion of parts that are defective. 

If np = 5 and nq =5 for a binomial distribution, then the sampling 
distribution for p is approximately normal with a mean of uw, = p and a standard 
error of 


op = V pq/n. 
z-Test for a Proportion p 


The z-test for a proportion p is a statistical test for a population proportion. 
The z-test can be used when a binomial distribution is given such that np = 5 
and nq = 5. The test statistic is the sample proportion f and the standardized 
test statistic is 


PP 


Vpq/n 


Standardized test statistic for p 


GUIDELINES 


Using a z-Test for a Proportion p 
In Words 
1. Verify that the sampling distribution 


of 6 can be approximated by a 
normal distribution. 


In Symbols 
np = 5,nq =5 


. State the claim mathematically 
and verbally. Identify the null 
and alternative hypotheses. 


State Hp and H,. 


. Specify the level of significance. Identify a. 


. Determine the critical value(s). Use Table 4 in Appendix B. 
. Determine the rejection region(s). 

pp 

Vpq/n 
If z is in the rejection region, 
then reject Hp. Otherwise, 
fail to reject Ho. 


. Find the standardized test statistic Z= 
and sketch the sampling distribution. 

. Make a decision to reject or fail to 
reject the null hypothesis. 


. Interpret the decision in the context 
of the original claim. 


In Step 7 of the guidelines, the decision rule uses rejection regions. You can 
also test a claim using P-values, as shown in the Study Tip at the left. 


To explore this topic further, 
see Activity 7.4 on page 415. 


Study Tip 


Remember that when 
you fail to reject Hp, a 
type Il error is possible. 
For instance, in Example 1 
the null hypothesis, 

p = 0.45, may be false. 


SECTION 7.4 Hypothesis Testing for Proportions 411 


See TI-84 Plus 
steps on page 437. 


Hypothesis Test for a Proportion 


A researcher claims that less than 45% of U.S. adults use passwords that are 
less secure because complicated ones are too hard to remember. In a random 
sample of 100 adults, 41% say they use passwords that are less secure because 
complicated ones are too hard to remember. At a = 0.01, is there enough 
evidence to support the researcher’s claim? (Adapted from Pew Research Center) 


SOLUTION 

The products np = 100(0.45) = 45 and nq = 100(0.55) = 55 are both 
greater than 5. So, you can use a z-test. The claim is “less than 45% of U.S. 
adults use passwords that are less secure because complicated ones are too 
hard to remember.” So, the null and alternative hypotheses are 


Ho: p = 0.45 and A: p < 0.45. (Claim) 


Because the test is a left-tailed test and the level of significance is a = 0.01, 
the critical value is z) = —2.33 and the rejection region is z < —2.33. The 
standardized test statistic is 


P-P 
LS Because np = 5 and n = 5, you can use the z-test. 
Vpq/n 
0.41 — 0.45 
Assume p = 0.45. 
V(0.45)(0.55) /100 
= —0.80. Round to two decimal places. 


The figure shows the location of the 
rejection region and the standardized test 
statistic z. Because z is not in the rejection 
region, you fail to reject the null hypothesis. 


Interpretation There is not enough 

evidence at the 1% level of significance t—te—+—_}+— z 
to support the claim that less than 45% ap eee ee 
of U.S. adults use passwords that are less %0=~23 z= —0.80 

secure because complicated ones are too 1% Level of Significance 
hard to remember. 


TRY IT YOURSELF 1 


A researcher claims that more than 90% of U.S. adults have access to a 
smartphone. In a random sample of 150 adults, 87% say they have access to a 
smartphone. At a = 0.01, is there enough evidence to support the researcher’s 
claim? (Adapted from Nielsen Mobile Insights) 

Answer: Page A37 


To use a P-value to perform the hypothesis 
test in Example 1, you can use technology, as 
shown at the right, or you can use Table 4. Using 
Table 4, the area corresponding to z = —0.80 is r 5 
0.2119. Because this is a left-tailed test, the P-value 
is equal to the area to the left of z = —0.80. So, P=.4 
P = 0.2119. (This value differs from the one 
found using technology due to rounding.) Because 
the P-value is greater than a = 0.01, you fail to 
reject the null hypothesis. Note that this is the 
same result obtained in Example 1. 


TI-84 PLUS 


412 CHAPTER 7 Hypothesis Testing with One Sample 


BN 
O 


eS 


ex) Picturing 
the World 


According to a survey, at least 
35% of smartphone owners say 
the first thing they access on their 
phones each day is texts or instant 
messages. To test this claim, you 
randomly select 300 smartphone 
owners. In the sample, you find 
that 93 of them say the first thing 
they access on their phones 

each day is texts or instant 
messages. (Adapted from Deloitte’s 
2016 Global Mobile Consumer Survey: 
U.S. edition) 


Recall from Section 6.3 that when the sample proportion is not given, you 
can find it using the formula 


A x ; 
pat Sample proportion 
n 


where x is the number of successes in the sample and n is the sample size. 


See Minitab steps 
on page 436. 
Hypothesis Test for a Proportion 


A researcher claims that 51% of U.S. adults believe, incorrectly, that 
antibiotics are effective against viruses. In a random sample of 2202 adults, 
1161 say antibiotics are effective against viruses. At a = 0.10, is there enough 
evidence to support the researcher’s claim? (Source: HealthDay/Harris Poll) 


SOLUTION 


The products np = 2202(0.51) ~ 1123 and ng = 2202(0.49) ~ 1079 are 
both greater than 5. So, you can use a z-test. The claim is “51% of U.S. adults 
believe, incorrectly, that antibiotics are effective against viruses.” So, the null 
and alternative hypotheses are 


Ho: p = 0.51 (Claim) and A, p # 0.51. 


At a = 0.05, is there enough 
evidence to reject the claim? 


Because the test is a two-tailed test and the level of significance is a = 0.10, 
the critical values are —z) = —1.645 and z) = 1.645. The rejection regions are 
z < —1.645 and z > 1.645. Because the number of successes is x = 1161 and 
n = 2202, the sample proportion is 

x — 1161 


(= 2 So ae 
Pn 2202 me 


The standardized test statistic is 


— p-p Because np = 5 and ng = 5, 
Vpq/n you can use the z-test. 
= pa) = Assume p = 0.51. 
(0.51) (0.49) /2202 
= 1.60. Round to two decimal places. 


The figure shows the location of the 
rejection regions and the standardized test 
statistic z. Because z is not in the rejection 
region, you fail to reject the null hypothesis. 


Interpretation There is not enough 
evidence at the 10% level of significance 


-4-3 -2-1 0 1/2 3 4 
to reject the claim that 51% of U.S. adults ¢ e160 


believe, incorrectly, that antibiotics are 
effective against viruses. 


10% Level of Significance 


TRY IT YOURSELF 2 


A researcher claims that 67% of U.S. adults believe that doctors prescribing 
antibiotics for viral infections for which antibiotics are not effective is 
a significant cause of drug-resistant superbugs. (Superbugs are bacterial 
infections that are resistant to many or all antibiotics.) In a random sample 
of 1768 adults, 1150 say they believe that doctors prescribing antibiotics for 
viral infections for which antibiotics are not effective is a significant cause of 
drug-resistant superbugs. At a = 0.10, is there enough evidence to support the 
researcher’s claim? (Source: HealthDay/Harris Poll) Answer: Page A37 


SECTION 7.4 Hypothesis Testing for Proportions 413 


TA EXERCISES For Extra Hop: MyLab Sats 


Building Basic Skills and Vocabulary 


1. Explain how to determine whether a normal distribution can be used to 
approximate a binomial distribution. 


2. Explain how to test a population proportion p. 


In Exercises 3—6, determine whether a normal sampling distribution can be used. 
[f it can be used, test the claim. 


3. Claim: p < 0.12; a = 0.01. Sample statistics: 6 = 0.10,n = 40 
4. Claim: p = 0.48; a = 0.08. Sample statistics: 6 = 0.40,n = 90 
5. Claim: p # 0.15; a = 0.05. Sample statistics: 6 = 0.12, n = 500 
6. Claim: p > 0.70; a = 0.04. Sample statistics: p = 0.64,n = 225 


Using and Interpreting Concepts 


Hypothesis Testing Using Rejection Regions Jn Exercises 7-12, 
(a) identify the claim and state Hy and H.,,, (b) find the critical value(s) and identify 
the rejection region(s), (c) find the standardized test statistic z, (d) decide whether 
to reject or fail to reject the null hypothesis, and (e) interpret the decision in the 
context of the original claim. 


7. Vaccination Requirement A medical researcher says that less than 80% of 
USS. adults think that healthy children should be required to be vaccinated. 
In a random sample of 200 U.S. adults, 82% think that healthy children 
should be required to be vaccinated. At a = 0.05, is there enough evidence 
to support the researcher’s claim? (Adapted from Pew Research Center) 


8. Internal Revenue Service Audits A research center claims that at least 
27% of U.S. adults think that the IRS will audit their taxes. In a random 
sample of 1000 U.S. adults in a recent year, 23% say they are concerned that 
the IRS will audit their taxes. At a = 0.01, is there enough evidence to reject 
the center’s claim? (Source: Rasmussen Reports) 


9. Student Employment An eduction researcher claims that at most 3% of 
working college students are employed as teachers or teaching assistants. 
In a random sample of 200 working college students, 4% are employed as 
teachers or teaching assistants. At a = 0.01, is there enough evidence to 
reject the researcher’s claim? (Adapted from Sallie Mae) 


10. Working Students An education researcher claims that 57% of college 
students work year-round. In a random sample of 300 college students, 171 
say they work year-round. At a = 0.10, is there enough evidence to support 
the researcher’s claim? (Adapted from Sallie Mae) 


11. Zika Virus A researcher claims that 85% percent of Americans think they 
are unlikely to contract the Zika virus. In a random sample of 250 Americans, 
225 think they are unlikely to contract the Zika virus. At a = 0.05, is there 
enough evidence to reject the researcher’s claim? (Adapted from Gallup) 


12. Changing Jobs A research center claims that more than 29% of US. 
employees have changed jobs in the past three years. In a random sample 
of 180 U.S. employees, 63 have changed jobs in the past three years. At 
a = 0.10, is there enough evidence to support the center’s claim? (Adapted 
from Gallup) 


414 CHAPTER 7 Hypothesis Testing with One Sample 


Protecting the 
Environment 


How often adults say 
they make an effort to 
live in ways that help 
protect the environment: 
Not too often . 
13% All of ais 


Not at all 
4% 


Some of the time 


63% 


FIGURE FOR EXERCISES 17 AND 18 


Hypothesis Testing Using a P-Value Jn Exercises 13-16, (a) identify 
the claim and state H, and H,, (b) use technology to find the P-value, (c) decide 
whether to reject or fail to reject the null hypothesis, and (d) interpret the decision 
in the context of the original claim. 


13. Space Travel A research center claims that 27% of U.S. adults would 
travel into space on a commercial flight if they could afford it. In a random 
sample of 1000 U.S. adults, 30% say that they would travel into space on 
a commercial flight if they could afford it. At a = 0.05, is there enough 
evidence to reject the research center’s claim? (Source: Rasmussen Reports) 


14. Purchasing Food Online A research center claims that at most 18% of 
US. adults’ online food purchases are for snacks. In a random sample of 
1995 U.S. adults, 20% say their online food purchases are for snacks. At 
a = 0.10, is there enough evidence to support the center’s claim? (Source: 
The Harris Poll) 


15. Pet Ownership A humane society claims that less than 67% of U.S. 
households own a pet. In a random sample of 600 U.S. households, 390 
say they own a pet. At a = 0.10, is there enough evidence to support the 
society’s claim? (Adapted from The Humane Society of the United States) 


16. Stray dogs A humane society claims that 5% of U.S. households have 
taken in a stray dog. Ina random sample of 200 U.S. households, 12 say they 
have taken in a stray dog. At a = 0.05, is there enough evidence to reject the 
society’s claim? (Adapted from The Humane Society of the United States) 


Protecting the Environment Jn Exercises 17 and 18, use the figure at the 
left, which suggests what adults think about protecting the environment. (Source: 
Pew Research Center) 


17. Are People Concerned About Protecting the Environment? You interview 
arandom sample of 100 adults. The results of the survey show that 59% of the 
adults said they live in ways that help protect the environment some of the 
time. At a = 0.05, can you reject the claim that at least 63% of adults make 
an effort to live in ways that help protect the environment some of the time? 


18. What Are People’s Attitudes About Protecting the Environment? Use 
your conclusion from Exercise 17 to write a paragraph on people’s attitudes 
about protecting the environment. 


Extending Concepts 


Alternative Formula Jn Exercises 19 and 20, use the information below. 
When you know the number of successes x, the sample size n, and the population 
proportion p, it can be easier to use the formula 


_ xX — np 
ViPq 


to find the standardized test statistic when using a z-test for a population 
proportion p. 


19. Rework Exercise 7 using the alternative formula and verify that the results 
are the same. 


20. The alternative formula is derived from the formula 


= Poe _ &/n)=p 
Vpq/n Vpq/n 


Use this formula to derive the alternative formula. Justify each step. 


ACTIVITY 


You can find the interactive 
applet for this activity 


Ta 


APPLET 


within MyLab Statistics or at 
www.pearsonglobaleditions 


.com. 


APPLET 


Hypothesis Tests for a Proportion 


The hypothesis tests for a proportion applet allows you to visually investigate 
hypothesis tests for a population proportion. You can specify the sample size n, the 
population proportion (True p), the null value for the proportion (Null p), and the 
alternative for the test (Alternative). When you click SIMULATE, 100 separate 
samples of size n will be selected from a population with a proportion of successes 
equal to True p. For each of the 100 samples, a hypothesis test based on the 
Z statistic is performed, and the results from each test are displayed in plots at the 
right. The standardized test statistic for each test is shown in the top plot and the 
P-value is shown in the bottom plot. The green and blue lines represent the cutoffs 
for rejecting the null hypothesis with the 0.05 and 0.01 level tests, respectively. 
Additional simulations can be carried out by clicking SIMULATE multiple times. 
The cumulative number of times that each test rejects the null hypothesis is also 
shown. Press CLEAR to clear existing results and start a new simulation. 


Null p:} 0.5 
Alternative: 


Simulate | 


Cumulative results: 


0.05 level 0.01 level 
Reject null 


Fail to reject null 


Prop. rejected 


Clear | 


Step 1 Specify a value for n. 

Step 2 Specify a value for True p. 

Step 3 Specify a value for Null p. 

Step 4 Specify an alternative hypothesis. 

Step 5 Click SIMULATE to generate the hypothesis tests. 


DRAW CONCLUSIONS 


1. Set n = 25 and True p = 0.35. Test the claim that the proportion is equal 
to 35%. What are the null and alternative hypotheses? Run the simulation 
so that at least 1000 tests are run. Compare the proportion of null hypothesis 
rejections for the 0.05 and 0.01 levels. Is this what you would expect? Explain. 


2. Set n = 50 and True p = 0.6. Test the claim that the proportion is at least 
40%. What are the null and alternative hypotheses? Run the simulation so 
that at least 1000 tests are run. Compare the proportion of null hypothesis 
rejections for the 0.05 and 0.01 levels. Perform a hypothesis test for each level. 
Use the results of the hypothesis tests to explain the results of the simulation. 


SECTION 7.4 Hypothesis Testing for Proportions 415 


416 CHAPTER 7 Hypothesis Testing with One Sample 


1 


What You Should Learn 
» How to find critical values for a 
chi-square test 


~ How to use the chi-square 
test to test a variance o? ora 


standard deviation 7 


Critica 
2 
value X 6 


Right-Tailed Test 


Critical 
value X ; 


Left-Tailed Test 


fs _ 
Critical Critical 
value i. value i 


Two-Tailed Test 


Hypothesis Testing for Variance and Standard Deviation 


Critical Values for a Chi-Square Test m The Chi-Square Test 


Critical Values for a Chi-Square Test 


In real life, it is important to produce consistent, predictable results. For instance, 
consider a company that manufactures golf balls. The manufacturer must produce 
millions of golf balls, each having the same size and the same weight. There is 
a very low tolerance for variation. For a normally distributed population, you 
can test the variance and standard deviation of the process using the chi-square 
distribution with n — 1 degrees of freedom. Before learning how to do the test, 
you must know how to find the critical values, as shown in the guidelines. 


GUIDELINES 


Finding Critical Values for a Chi-Square Test 
1. Specify the level of significance a. 


2. Identify the degrees of freedom, d.f. = n — 1. 


3. The critical values for the chi-square distribution are found in Table 6 
in Appendix B. To find the critical value(s) for a 


a. right-tailed test, use the value that corresponds to d.f. and a. 
b. left-tailed test, use the value that corresponds to d.f. and 1 — a. 


c. two-tailed test, use the values that correspond to d.f. and Sa, and 
d.f.and 1 — 5a. 


See the figures at the left. 


Finding a Critical Value for a Right-Tailed Test 
Find the critical value X for a right-tailed test when n = 26 and a = 0.10. 


SOLUTION 


The degrees of freedom are df. = n — 1 = 26 — 1 = 25. The figure below 
shows a chi-square distribution with 25 degrees of freedom and a shaded area 
of a = 0.10 in the right tail. Using Table 6 in Appendix B with d.f. = 25 and 
a = 0.10, the critical value is yj = 34.382. 


a =0.10 


i I I 1 (ov 
T T T T T T 

5 10 15 20 25 30 /35 40 45 

2 = 34.382 


TRY IT YOURSELF 1 


Find the critical value X for a right-tailed test when n = 18 and a = 0.01. 
Answer: Page A37 


A B 


41 | CHISG. INV(O.025,8) 


2| 2.179730747 


3] CHISQ. INV.RT(O.025,8) 


4 17.53454614 


SECTION 7.5 Hypothesis Testing for Variance and Standard Deviation 417 


Finding a Critical Value for a Left-Tailed Test 
Find the critical value X for a left-tailed test when n = 11 anda = 0.01. 
SOLUTION 
The degrees of freedom are 
df.=n—-—1=11-1=10. 


The figure at the left shows a chi-square distribution with 10 degrees of 
freedom and a shaded area of a = 0.01 in the left tail. The area to the right of 
the critical value is 

1-—a=1-0.01 = 0.99. 


Using Table 6 with d.f. = 10 and the area 0.99, the critical value is Pe = 2.558. 
You can check your answer using technology, as shown below. 


MINITAB 


Inverse Cumulative Distribution Function 
Chi-Square with 10 DF 
PiGxe=—) 


o.01 


TRY IT YOURSELF 2 


Find the critical value x for a left-tailed test when n = 30 and a = 0.05. 
Answer: Page A37 


Note that because chi-square distributions are not symmetric (like normal 
or ¢-distributions), in a two-tailed test the two critical values are not opposites. 
Each critical value must be calculated separately, as shown in the next example. 


Finding Critical Values for a Two-Tailed Test 
Find the critical values Xp and Xp for a two-tailed test when n = 9 and a = 0.05. 


SOLUTION 
The degrees of freedom are A 


df.=n-1=9-1=8. 


The figure shows a chi-square distribution 
with 8 degrees of freedom and a shaded area 
of 3a = 0.025 in each tail. The area to the 
right of Xe is $a = 0.025, and the area to the 
right of Xp is 1 — Sa = 0.975. Using Table 6 
with d.f. = 8 and the areas 0.025 and 0.975, 
the critical values are es = 17.535 and 
Xp = 2.180. You can check you answer 
using technology, as shown at the left. 


TRY IT YOURSELF 3 


Find the critical values Xp and Xe for a two-tailed test when n = 51 and 
a = 0.01. Answer: Page A37 


418 CHAPTER 7 Hypothesis Testing with One Sample 


The Chi-Square Test 


To test a variance o” or a standard deviation o of a population that is normally 
distributed, you can use the chi-square test. The chi-square test for a variance or 
standard deviation is not as robust as the tests for the population mean yu or the 
population proportion p. So, it is essential in performing a chi-square test for a 
variance or standard deviation that the population be normally distributed. The 
results can be misleading when the population is not normal. 


Chi-Square Test for a Variance o? or Standard Deviation o 


The chi-square test for a variance o? or standard deviation o is a statistical 
test for a population variance or standard deviation. The chi-square test can 
only be used when the population is normal. The test statistic is s* and the 
standardized test statistic 


2... ue 1)s* 


a Standardized test statistic for 0” or o 
o 


follows a chi-square distribution with degrees of freedom 
df.=n-1. 


In Step 8 of the guidelines below, the decision rule uses rejection regions. 
You can also test a claim using P-values (see Exercises 31-34). 


GUIDELINES 


Using the Chi-Square Test for a Variance o? or a Standard Deviation 7 
In Words In Symbols 


. Verify that the sample is random 
and the population is normally 
distributed. 


. State the claim mathematically State Hp and H,. 
and verbally. Identify the null 
and alternative hypotheses. 


. Specify the level of significance. Identify a. 

. Identify the degrees of freedom. dfi.=n-1 

. Determine the critical value(s). Use Table 6 in Appendix B. 
. Determine the rejection region(s). 


. Find the standardized test statistic v= 
and sketch the sampling 
distribution. 


(n — 1)s? 


o 


. Make a decision to reject or fail to If y’ is in the rejection region, 
reject the null hypothesis. then reject Hp. Otherwise, fail 
to reject Hp. 


. Interpret the decision in the context 
of the original claim. 


For Step 5 of the guidelines, in addition to using Table 6 in Appendix B, you 
can use technology to find the critical value(s). Also, some technology tools allow 
you to perform a hypothesis test for a variance (or a standard deviation) using 
only the descriptive statistics. 


a = 0.05 


10 20 30 40) 50 \6o 70 
4?=43.2 x2 = 55.758 


SECTION 7.5 Hypothesis Testing for Variance and Standard Deviation 419 


Using a Hypothesis Test for the Population Variance 


A dairy processing company claims that the variance of the amount of fat in the 
whole milk processed by the company is no more than 0.25. You suspect this 
is wrong and find that a random sample of 41 milk containers has a variance 
of 0.27. At a = 0.05, is there enough evidence to reject the company’s claim? 
Assume the population is normally distributed. 


SOLUTION 


Because the sample is random and the population is normally distributed, you 
can use the chi-square test. The claim is “the variance is no more than 0.25.” 
So, the null and alternative hypotheses are 


Hyg o? 2025 (Gaim) and Azo > 025. 


The test is a right-tailed test, the level of significance is a = 0.05, and the 
degrees of freedom are d.f. = 41 — 1 = 40. So, using Table 6, the critical 
value is 


Xo = 55.758. 
The rejection region is 7 > 55.758. The standardized test statistic is 
— (n — 1)s? ; 
x = a Use the chi-square test. 
= a i uae Assume a? = 0.25. 
= 43.2. 


The figure at the left shows the location of the rejection region and the 
standardized test statistic y”. Because y7 is not in the rejection region, you fail 
to reject the null hypothesis. You can check your answer using technology, as 
shown below. Note that the test statistic, 43.2, is the same as what you found 
above. 


STATCRUNCH 


One sample variance hypothesis test: 
o®° : Variance of population 

ne @ = 0.25 

H,: 0° > 0.25 


Hypothesis test results: 


Variance Sample Var. DF / Chi-square Stat \ P-value 
oF 0.27 40 43.2/ 0.3362 


Interpretation There is not enough evidence at the 5% level of significance 
to reject the company’s claim that the variance of the amount of fat in the 
whole milk is no more than 0.25. 


TRY IT YOURSELF 4 


A bottling company claims that the variance of the amount of sports drink in 
a 12-ounce bottle is no more than 0.40. A random sample of 31 bottles has a 
variance of 0.75. At a = 0.01, is there enough evidence to reject the company’s 
claim? Assume the population is normally distributed. 

Answer: Page A37 


420 CHAPTER 7 Hypothesis Testing with One Sample 


Study Tip 


Although you are testing 
a standard deviation 

in Example 5, the standardized 

test statistic y? requires variance. 
Remember to square the standard 
deviation to calculate the variance. 


Using a Hypothesis Test for the Standard Deviation 


A company claims that the standard deviation of the lengths of time it takes 
an incoming telephone call to be transferred to the correct office is less than 
1.4 minutes. A random sample of 25 incoming telephone calls has a standard 
deviation of 1.1 minutes. At a = 0.10, is there enough evidence to support the 
company’s claim? Assume the population is normally distributed. 


SOLUTION 


Because the sample is random and the population is normally distributed, you 
can use the chi-square test. The claim is “the standard deviation is less than 


1.4 minutes.” So, the null and alternative hypotheses are 
Hy: o = 1.4 minutes and H,:0 < 1.4 minutes. (Claim) 


The test is a left-tailed test, the level of significance is a = 0.10, and the 
degrees of freedom are 


d.f. = 25 — 1 = 24. 


So, using Table 6, the critical value is 


Ne = 15.659. 
The rejection region is y7 < 15.659. The standardized test statistic is 
> (n- 1)s? 
ie a Use the chi-square test. 
Oo 


25 — 1)(1.1)? 
= oes Cs Assume o = 1.4. 


(1.4)* 
14.816. 


N 


Round to three decimal places. 


The figure below shows the location of the rejection region and the standardized 
test statistic y?. Because x” is in the rejection region, you reject the null 
hypothesis. 


| | | | 
5 10 20 25 30 35 40 
1? =14.816 X75, = 15.659 


Interpretation There is enough evidence at the 10% level of significance to 
support the claim that the standard deviation of the lengths of time it takes an 
incoming telephone call to be transferred to the correct office is less than 1.4 
minutes. 


TRY IT YOURSELF 5 


A police chief claims that the standard deviation of the lengths of response times 
is less than 3.7 minutes. A random sample of 9 response times has a standard 
deviation of 3.0 minutes. At a = 0.05, is there enough evidence to support the 
police chief’s claim? Assume the population is normally distributed. 

Answer: Page A37 


Ovo 


eee) Picturing 
the World 


A community center claims that 
the chlorine level in its pool has 
a standard deviation of 0.46 parts 
per million (ppm). A sampling of 
the pool’s chlorine levels at 25 
random times during a month 
yields a standard deviation of 
0.61 ppm. (Adapted from American 
Pool Supply) 


SS 


Frequency 


10 14 18 2.2 2.6 3.0 


Chlorine level (ppm) 


At 0.05, is there enough evidence 
to reject the claim? 


SECTION 7.5 Hypothesis Testing for Variance and Standard Deviation 421 


Using a Hypothesis Test for the Population Variance 


A sporting goods manufacturer claims that the variance of the strengths 
of a certain fishing line is 15.9. A random sample of 15 fishing line spools 
has a variance of 21.8. At a = 0.05, is there enough evidence to reject the 
manufacturer’s claim? Assume the population is normally distributed. 


SOLUTION 


Because the sample is random and the population is normally distributed, you 
can use the chi-square test. The claim is “the variance is 15.9.” So, the null and 


alternative hypotheses are 
Hy: o* = 15.9 (Claim) and H, 0° # 15.9. 


The test is a two-tailed test, the level of significance is a = 0.05, and the 
degrees of freedom are 


df.= 15-1 
= 14. 


Using Table 6, the critical values are Xp = 5.629 and ia = 26.119. The rejection 


regions are 
x < 5.629 and yy’ > 26119. 


The standardized test statistic is 


‘ (n — 1)s* ; 
= — > Use the chi-square test. 
o 
15 — 1)(21.8 
= aT Assume a? = 15.9. 
= 19.195. Round to three decimal places. 


The figure below shows the location of the rejection regions and the 
standardized test statistic y*. Because y’ is not in the rejection regions, you fail 
to reject the null hypothesis. 


A 


5\ 1015 20 25\ 30 


17 =5.629 17 ~19.195 17, = 26.119 


2 
L 
Interpretation There is not enough evidence at the 5% level of significance 
to reject the claim that the variance of the strengths of the fishing line is 15.9. 


TRY IT YOURSELF 6 


A company that offers dieting products and weight loss services claims that 
the variance of the weight losses of their users is 25.5. A random sample of 
13 users has a variance of 10.8. At a = 0.10, is there enough evidence to reject 
the company’s claim? Assume the population is normally distributed. 

Answer: Page A37 


422 CHAPTER 7 Hypothesis Testing with One Sample 


7.5 EXERCISES 


For Extra Help: MyLab Statistics 


Building Basic Skills and Vocabulary 


1. Explain how to find critical values in a chi-square distribution. 


2. Can a critical value for the chi-square test be negative? Explain. 


3. How do the requirements for a chi-square test for a variance or standard 
deviation differ from a z-test or a ¢-test for a mean? 


4. Explain how to test a population variance or a population standard deviation. 


In Exercises 5-12, find the critical value(s) and rejection region(s) for the type of 
chi-square test with sample size n and level of significance a. 


5. Right-tailed test, 
n = 27,a = 0.05 


7. Left-tailed test, 
n= 7,a= 0.01 

9. Two-tailed test, 
n= 81,a = 0.10 


11. Right-tailed test, 
n = 30,a = 0.01 


. Right-tailed test, 


n = 10,a = 0.10 


. Left-tailed test, 


n = 24,a = 0.05 


. Two-tailed test, 


n= 61,a = 0.01 


. Two-tailed test, 


n = 31,a = 0.05 


Graphical Analysis Jn Exercises 13 and 14, state whether each standardized 
test statistic x allows you to reject the null hypothesis. Explain. 


13. (a) x2 = 2.091 14. (a) x? = 22.302 


(b) x7 =0 (b) x? = 23.309 
(c) x? = 1.086 (c) x? = 8.457 
(d) x? = 6.3471 (d)- a? = 8.477 


A 


2 4 6 8 10 5 /10 15 20 \ 25 30 
X; 6251 i 8.547 x2, = 22.307 


x 


In Exercises 15-22, test the claim about the population variance o7 or standard 


deviation o at the level of significance a. Assume the population is normally distributed. 
15. Claim: 0” = 0.52; a = 0.05. Sample statistics: s? = 0.508, n = 18 

16. Claim: 
17. Claim: o~ = 17.6; a = 0.01. Sample statistics: s? = 28.33,n = 41 
18. Claim: o? > 19; a = 0.1. Sample statistics: s*> = 28,n = 17 

19. Claim: o? # 32.8; a = 0.1. Sample statistics: s* = 40.9,n = 101 
20. Claim: o? = 63; a = 0.01. Sample statistics: s* = 58,n = 29 

21. Claim: 0 < 40; a = 0.01. Sample statistics: s = 40.8,n = 12 

22. Claim: 0 = 24.9; a = 0.10. Sample statistics: s = 29.1,n = 51 


a 
V 


= 8.5; a = 0.05. Sample statistics: s* = 7.45,n = 23 


A 


SECTION 7.5 Hypothesis Testing for Variance and Standard Deviation 423 


Using and Interpreting Concepts 


Hypothesis Testing Using Rejection Regions In Exercises 23-30, 
(a) identify the claim and state Hj and H,, (b) find the critical value(s) and identify 
the rejection region(s), (c) find the standardized test statistic x’, (d) decide whether 
to reject or fail to reject the null hypothesis, and (e) interpret the decision in the 
context of the original claim. Assume the population is normally distributed. 


23. Helmets A helmet manufacturer claims that the variance of the thickness 
in a helmet model is 7.5. A random sample of 15 helmets has a variance of 
2.7. At a = 0.10, is there enough evidence to reject the claim? 


24. Gas Mileage An auto manufacturer claims that the variance of the gas 
mileages in a model of hybrid vehicle is 0.16. A random sample of 30 vehicles 
has a variance of 0.26. At a = 0.05, is there enough evidence to reject the 
claim? (Adapted from Green Hybrid) 


25. Mathematics Assessment Tests A school administrator claims that the 
standard deviation for grade 12 students on a mathematics assessment test 
is less than 35 points. A random sample of 28 grade 12 test scores has a 
standard deviation of 34 points. At a = 0.10, is there enough evidence to 
support the claim? (Adapted from National Center for Educational Statistics) 


26. Vocabulary Assessment Tests A school administrator claims that the 
standard deviation for grade 12 students on a vocabulary assessment test 
is greater than 45 points. A random sample of 25 grade 12 test scores has 
a standard deviation of 46 points. At a = 0.01, is there enough evidence to 
support the claim? (Adapted from National Center for Educational Statistics) 


27. Reading Days A library claims that the standard deviation of the reading 


Annual salaries : : 
days for readers of a particular book is no more than 2 days. A random 


47,262 67,363 81,246 sample of 30 readers has a standard deviation of 3 days. At a = 0.10, is there 
65,876 59,649 78,268 enough evidence to reject the claim? 
88,549 52,130 73,955 
91,288 54,476 86,787 28. Hotel Room Rates A travel analyst claims that the standard deviation of 
66,923 48,337 70,172 the room rates for two adults at three-star hotels in Denver is at least $68. 
A random sample of 18 three-star hotels has a standard deviation of $40. At 
TABLE FOR EXERCISE 29 a = 0.01, is there enough evidence to reject the claim? (Adapted from Expedia) 
29. Salaries The annual salaries (in dollars) of 15 randomly chosen senior level 
Aamuallealanice graphic design specialists are shown in the table at the left. At a = 0.05, is 
there enough evidence to support the claim that the standard deviation of 
59,922 99,493 98,221 the annual salaries is different from $10,300? (Adapted from Salary.com) 
90,143 65,106 78,975 
74,644 107,817 85,492 30. Salaries The annual salaries (in dollars) of 12 randomly chosen nursing 
87,179 90,505 71,090 supervisors are shown in the table at the left. At a = 0.10, is there enough 
evidence to reject the claim that the standard deviation of the annual salaries 
TABLE FOR EXERCISE 30 is $16,500? (Adapted from Salary.com) 


Extending Concepts 


P-Values You can calculate the P-value for a chi-square test using technology. 
After calculating the standardized test statistic, use the cumulative distribution 
function (CDF) to calculate the area under the curve. From Example 4 on page 419, 
y’ = 43.2. Using a TI-84 Plus (choose 8 from the DISTR menu), enter 0 for 
the lower bound, 43.2 for the upper bound, and 40 for the degrees of freedom, 
as shown at the left. Because it is a right-tailed test, the P-value is approximately 
1 — 0.6638 = 0.3362. Because P > a = 0.05, fail to reject Hp. In Exercises 31-34, 
use the P-value method to perform the hypothesis test for the indicated exercise. 


31. Exercise 25 32. Exercise 26 33. Exercise 27 34. Exercise 28 


TI-84 PLUS 
Kecd? (hs 43, 2.483 
26637 PESR67 


424 CHAPTER 7 Hypothesis Testing with One Sample 


A Summary of Hypothesis Testing 


With hypothesis testing, perhaps more than any other area of statistics, it can be 
difficult to see the forest for all the trees. To help you see the forest—the overall 
picture —a summary of what you studied in this chapter is provided. 


Writing the Hypotheses 
m You are given a claim about a population parameter p, p, 0”, or a. 


= Rewrite the claim and its complement using =, =, = and >, <, #. 
u y) = y 


i, a i 


A A, 


= Identify the claim. Is it Hp or H,? 


Specifying a Level of Significance 


= Specify a, the maximum acceptable probability of rejecting a valid Hp 
(a type I error). 


Specifying the Sample Size 


Study Tip 


Large sample sizes 
will usually increase 
the cost and effort of testing a 

hypothesis, but they also tend to 
make your decision more reliable. 


= Specify your sample size n. 


Choosing the Test 4 Normally distributed population © Any population 
= Mean: Hp describes a hypothesized population mean p. 

4 Use a z-test when o is known and the population is normal. 

e Use a z-test for any population when a is known and n = 30. 

4 Use a ttest when o is not known and the population is normal. 

e Use a t-test for any population when a is not known and n = 30. 
= Proportion: Hp describes a hypothesized population proportion p. 

e Use a z-test for any population when np = 5 and ng = 5S. 


= Variance or Standard Deviation: H, describes a hypothesized population 
variance o” or standard deviation o. 


4 Use a chi-square test when the population is normal. 


Sketching the Sampling Distribution 
= Use H, to decide whether the test is left-tailed, right-tailed, or two-tailed. 


Finding the Standardized Test Statistic 
= Take a random sample of size n from the population. 
= Compute the test statistic x, p, or s”. 


= Find the standardized test statistic z, t, or x7. 


Making a Decision 

Option 1. Decision based on rejection region 

= Use a to find the critical value(s) Zo, fo, or X and rejection region(s). 
= Decision Rule: 


Reject Hp when the standardized test statistic is in the rejection region. 
Fail to reject Hy when the standardized test statistic is not in the rejection region. 


Option 2. Decision based on P-value 
= Use the standardized test statistic or technology to find the P-value. 
= Decision Rule: 


Reject Hy) when P = a. 
Fail to reject Hj when P > a. 


* Study Tip 

When your standardized test 

statistic is z or t, remember 

that these values measure 

standard deviations from 

the mean. Values that are 

outside of +3 indicate that 
Hp is very unlikely. Values that are 
outside of +5 indicate that Hp is 
almost impossible. 


A Summary of Hypothesis Testing 425 


z-Test for a Hypothesized Mean p (o Known) (Section 7.2) 


Test statistic: x Standardized test statistic: z 
Sampling distribution of ou mean 
sample means is a normal ans /Vn 

Gis tapuiote Population 4 t Sample size 


standard deviation 


Left-Tailed Two-Tailed Right-Tailed 
z-Test for a Hypothesized Proportion p (Section 7.4) 
Test statistic: 6 Standardized test statistic: z 
Critical value: z) (Use Table 4.) Sample, _ ;> ——Hypothesized 
Sampling distribution of sample proportion b=s proportion 
roportions is a normal distribution. a= 
ee Vpq/n 
g— lp t t Sample size 


t-Test for a Hypothesized Mean px (o Unknown) (Section 7.3) 


Test statistic: x Standardized test statistic: ¢ 

Critical value: tj (Use Table 5.) Sample mean, > ——Hypothesized 
Sampling distribution of sample means t= pf mean 

is approximated by a ¢-distribution cS s/Va 

wid ke Sample ft Sample size 


standard deviation 


Left-Tailed Two-Tailed Right-Tailed 


Chi-Square Test for a Hypothesized Variance a” or Standard Deviation a 
(Section 7.5) 


Test statistic: 57 Standardized test statistic: 7 
Critical value: x5 (Use Table 6.) 


: Se U ae : Sample size ——, Sample 
Sampling distribution is approximated Graal oe vanance 
by a chi-square distribution with Va a4 
df. =n — 1. in Hypothesized 

variance 


Left-Tailed Two-Tailed Right-Tailed 


AND | Statistics in the Real World 


Uses 


Hypothesis testing is important in many different fields because it gives a 
scientific procedure for assessing the validity of a claim about a population. Some 
of the concepts in hypothesis testing are intuitive, but some are not. For instance, 
the American Journal of Clinical Nutrition suggests that eating dark chocolate 
can help prevent heart disease. A random sample of healthy volunteers were 
assigned to eat 3.5 ounces of dark chocolate each day for 15 days. After 15 days, 
the mean systolic blood pressure of the volunteers was 6.4 millimeters of mercury 
lower. A hypothesis test could show whether this drop in systolic blood pressure 
is significant or simply due to sampling error. 

Careful inferences must be made concerning the results. The study only 
examined the effects of dark chocolate, so the inference of health benefits cannot 
be extended to all types of chocolate. You also would not infer that you should 
eat large quantities of chocolate because the benefits must be weighed against 
known risks, such as weight gain and acid reflux. 


Abuses 


Not Using a Random Sample The entire theory of hypothesis testing is based 
on the fact that the sample is randomly selected. If the sample is not random, 
then you cannot use it to infer anything about a population parameter. 


Attempting to Prove the Null Hypothesis When the P-value for a hypothesis 
test is greater than the level of significance, you have not proven the null 
hypothesis is true—only that there is not enough evidence to reject it. For 
instance, with a P-value higher than the level of significance, a researcher could 
not prove that there is no benefit to eating dark chocolate—only that there is not 
enough evidence to support the claim that there is a benefit. 


Making Type I or Type I Errors Remember that a type I error is rejecting a 
null hypothesis that is true and a type IJ error is failing to reject a null hypothesis 
that is false. You can decrease the probability of a type I error by lowering the level 
of significance a. Generally, when you decrease the probability of making a type I 
error, you increase the probability 8 of making a type I] error. Which error is more 
serious? It depends on the situation. In a criminal trial, a type I error is considered 
worse, as explained on page 374. If you are testing a person for a disease and they 
are assumed to be disease-free (H)), then a type II error is more serious because 
you would fail to detect the disease even though the person has it. You can 
decrease the chance of making both types of errors by increasing the sample size. 


Do You Consider the Amount of 
Federal Income Tax You Pay as Too | EXERCISES | 
High, About Right, or Too Low? PAASARLE) 
In Exercises 1-3, assume that you work for the Internal Revenue Service. 


You are asked to write a report about the claim that 57% of U.S. adults 
think the amount of federal income tax they pay is too high. (Source: Gallup) 


Too low 


6% About right 


37% 
1. What is the null hypothesis in this situation? Describe how your 
report could be incorrect by trying to prove the null hypothesis. 


2. Describe how your report could make a type I error. 


3. Describe how your report could make a type II error. 


426 CHAPTER 7 Hypothesis Testing with One Sample 


Chapter Summary 427 


Chapter Summary 


Review 
What Did You Learn? Example(s) Exercises 
Section 7.1 
» How to state a null hypothesis and an alternative hypothesis 1 18 
» How to identify type | and type II errors 2 7-10 
» How to know whether to use a one-tailed or a two-tailed statistical test and 3 7-10 
find a P-value 
» How to interpret a decision based on the results of a statistical test + 7-10 
» How to write a claim for a hypothesis test 5 7-10 
Section 7.2 
» How to find and interpret P-values 1-3 11, 12 
» How to use P-values for a z-test for a mean ~ when a is known 4-6 25, 26 
» How to find critical values and rejection regions in the standard normal 7,8 13-16 
distribution 
» How to use rejection regions for a z-test for a mean w when oa is known 9, 10 17-24, 27, 28 
Section 73 
» How to find critical values in a t-distribution 1-3 29-34 
» How to use the t-test to test a mean ww when ga is not known 4,5 35-42 
» How to use technology to find P-values and use them with a t-test to test a 6 43, 44 
mean » when a is not known 
Section 7.4 
» How to use the z-test to test a population proportion p 1,2 45-50 
Section 7.5 
» How to find critical values for a chi-square test 1-3 51-54 
4-6 55-62 


» How to use the chi-square test to test a variance o” or a standard deviation o 


428 CHAPTER 7 Hypothesis Testing with One Sample 


t t T z 
0 1 2 3 
=) = 1.645 zy = 1.645 


FIGURE FOR EXERCISES 17-20 


Review Exercises 


Section 7.1 


In Exercises 1-6, the statement represents a claim. Write its complement and state 
which is Ho and which is H,. 


1. w = 100 2. uw # 48 3. p < 0.205 

4. uw = 16 5. 0 > 2.5 6. p = 0.4 
In Exercises 7-10, (a) state the null and alternative hypotheses and identify which 
represents the claim, (b) describe type I and type II errors for a hypothesis test of the 
claim, (c) explain whether the hypothesis test is left-tailed, right-tailed, or two-tailed, 


(d) explain how you should interpret a decision that rejects the null hypothesis, and 
(e) explain how you should interpret a decision that fails to reject the null hypothesis. 


7. A polling organization reports that the proportion of U.S. adults who have 
volunteered their time or donated money to help clean up the environment 
is 65%. (Source: Rasmussen Reports) 


8. A car rental company claims that its cars are driven for at least 10,000 miles 
on an average. 


9. A nonprofit consumer organization says that the standard deviation of the 
fuel economies of its top-rated vehicles for a recent year is no more than 
9.5 miles per gallon. (Adapted from Consumer Reports) 


10. A soft drink manufacturing company claims that the mean number of 
milligrams of sodium in one serving is less than 25. 


Section 7.2 


In Exercises 11 and 12, find the P-value for the hypothesis test with the standardized 
test statistic z. Decide whether to reject Ho for the level of significance a. 


11. Left-tailed test, z = —1.18,a = 0.05 
12. Two-tailed test, z = 2.57, a = 0.10 


In Exercises 13-16, find the critical value(s) and rejection region(s) for the type of 
z-test with level of significance a. Include a graph with your answer. 


13. Left-tailed test, a = 0.02 14. Two-tailed test, a = 0.005 
15. Right-tailed test, a = 0.025 16. Two-tailed test, a = 0.03 


In Exercises 17-20, state whether the standardized test statistic z allows you to 
reject the null hypothesis. Explain your reasoning. 


17. z = 1.631 18. z = 1.723 19. z = —1.464 20. z = —1.655 


In Exercises 21-24, test the claim about the population mean yw at the level of 
significance a. Assume the population is normally distributed. 


21. Claim: H,: w = 3725; a = 0.10; 0 = 121. Sample statistics: x = 3748, n = 30 
22. Claim: w < 8.25; a = 0.01; 0 = 0.017. Sample statistics: x = 8.246,n = 40 
23. Claim: w # 21.25; a = 0.03; 0 = 4.35. Sample statistics: x = 19.7,n = 60 
24, Claim: w = 930; a = 0.10; o = 30. Sample statistics: x = 937,n = 30 


Review Exercises 429 


In Exercises 25 and 26, (a) identify the claim and state Hy and H,, (b) find the 
standardized test statistic z, (c) find the corresponding P-value, (d) decide whether 
to reject or fail to reject the null hypothesis, and (e) interpret the decision in the 
context of the original claim. 


25. Cotton Production A researcher claims that the mean annual production 
of cotton is 3.5 million bales per country. A random sample of 44 countries 
has a mean annual production of 2.1 million bales. Assume the population 
standard deviation is 4.5 million bales. At a = 0.05, can you reject the 
claim? (Source: U.S. Department of Agriculture) 


26. CottonConsumption A researcher claims that the mean annual consumption 
of cotton is greater than 1.1 million bales per country. A random sample of 
67 countries has a mean annual consumption of 1.0 million bales. Assume 
the population standard deviation is 4.3 million bales. At a = 0.01, can you 
support the claim? (Source: U.S. Department of Agriculture) 


In Exercises 27 and 28, (a) identify the claim and state Hj) and H,, (b) find the 
critical value(s) and identify the rejection region(s), (c) find the standardized test 
statistic z, (d) decide whether to reject or fail to reject the null hypothesis, and 
(e) interpret the decision in the context of the original claim. 


27. An environmental researcher claims that the mean amount of sulfur dioxide 
in the air in US. cities is 1.15 parts per billion. In a random sample of 
134 USS. cities, the mean amount of sulfur dioxide in the air is 0.93 parts per 
billion. Assume the population standard deviation is 2.62 parts per billion. 
At a = 0.01, is there enough evidence to reject the claim? (Source: U.S. 
Environmental Protection Agency) 


28. A travel analyst claims that the mean price of a round trip flight from New 
York City to Los Angeles is less than $507. In a random sample of 55 round 
trip flights from New York City to Los Angeles, the mean price is $502. 
Assume the population standard deviation is $111. At a = 0.05, is there 
enough evidence to support the travel analyst’s claim? (Adapted from Expedia) 


Section 7.3 


In Exercises 29-34, find the critical value(s) and rejection region(s) for the type of 
t-test with level of significance a and sample size n. 


29. Two-tailed test, a = 0.05,n = 18 
30. Left-tailed test, a = 0.025, n = 39 
31. Right-tailed test, a = 0.05,n = 12 
32. Two-tailed test, a = 0.01,n = 20 
33. Left-tailed test, a = 0.005,n = 15 
34. Two-tailed test, a = 0.02,n = 12 


In Exercises 35-40, test the claim about the population mean y at the level of 
significance a. Assume the population is normally distributed. 


35. Claim: w > 12,700; a = 0.005. 
Sample statistics: ¥ = 12,855, 5 = 248,n = 21 


36. Claim: = 0; a = 0.10. Sample statistics: ¥ = —0.45, 5 = 2.38,n = 31 
37. Claim: w = 51; a = 0.01. Sample statistics: ¥ = 52,5 = 2.5,n = 40 

38. Claim: w < 850; a = 0.025. Sample statistics: ¥ = 875,5 = 25,n = 14 
39. Claim: w = 195; a = 0.10. Sample statistics: ¥ = 190, 5 = 36,n = 101 


40. Claim: w # 3,330,000; a = 0.05. 
Sample statistics: ¥ = 3,293,995, 5 = 12,801, = 35 


430 


CHAPTER 7 Hypothesis Testing with One Sample 


In Exercises 41 and 42, (a) identify the claim and state H) and H,, (b) find the 
critical value(s) and identify the rejection region(s), (c) find the standardized test 
statistic t, (d) decide whether to reject or fail to reject the null hypothesis, and 
(e) interpret the decision in the context of the original claim. Assume the population 
is normally distributed. 


41. A fitness magazine advertises that the mean monthly cost of joining a health 
club is $25. You want to test this claim. You find that a random sample of 
18 clubs has a mean monthly cost of $26.25 and a standard deviation of $3.23. 
At a = 0.10, do you have enough evidence to reject the advertisement’s 
claim? 


42. A fitness magazine claims that the mean cost of a yoga session is no more 
than $14. You want to test this claim. You find that a random sample of 
32 yoga sessions has a mean cost of $15.59 and a standard deviation of 
$2.60. At a = 0.025, do you have enough evidence to reject the magazine’s 
claim? 


In Exercises 43 and 44, (a) identify the claim and state Hy and H,, (b) use 
technology to find the P-value, (c) decide whether to reject or fail to reject the 
null hypothesis, and (d) interpret the decision in the context of the original claim. 
Assume the population is normally distributed. 


eB 43. An education publication claims that the mean score for grade 12 
students on a science achievement test is more than 145. You want 
to test this claim. You randomly select 36 grade 12 test scores. The 
results are listed below. At a = 0.1, can you support the publication’s 
claim? (Adapted from National Center for Education Statistics) 


188 80 175 195 201 143 119 81 118 119 165 222 
109 134 200 110 199 181 79 135 124 205 90 120 
216 167 198 183 173 187 143 166 147 219 206 97 


44. An education researcher claims that the overall average score of 
15-year-old students on an international mathematics literacy test is 
494. You want to test this claim. You randomly select the average 
scores of 33 countries. The results are listed below. At a = 0.05, do you 
have enough evidence to reject the researcher’s claim? (Source: National 
Center for Education Statistics) 


561 554 536 531 523 518 515 511 506 S00 499 
493 490 489 485 482 482 479 477 466 453 448 
439 432 423 421 413 407 394 388 386 376 368 


Q 


Section 7.4 


In Exercises 45—48, determine whether a normal sampling distribution can be used 
to approximate the binomial distribution. If it can, test the claim. 
45. Claim: p = 0.15; a = 0.05 
Sample statistics: 6 = 0.09, n 
46. Claim: p = 0.65; a = 0.03 
Sample statistics: 6 = 0.76,n = 116 
47. Claim: p < 0.70; a = 0.01 
Sample statistics: 6 = 0.50,n = 68 
48. Claim: p = 0.04; a = 0.10 
Sample statistics: 6 = 0.03, n = 30 


40 


Review Exercises 431 


In Exercises 49 and 50, (a) identify the claim and state Hy and H,, (b) find the 
critical value(s) and identify the rejection region(s), (c) find the standardized test 
statistic z, (d) decide whether to reject or fail to reject the null hypothesis, and 
(e) interpret the decision in the context of the original claim. 


49. A polling agency reports that over 40% of U.S. adults say they are less likely 
to travel to Europe in the next six months for fear of terrorist attacks. In a 
random sample of 1000 U.S. adults, 42% said they are less likely to travel 
to Europe in the next six months for fear of terrorist attacks. At a = 0.01, 
is there enough evidence to support the agency’s claim? (Adapted from 
Rasmussen Reports) 


50. A labor researcher claims that 6% of U.S. employees say it is likely they 
will be laid off in the next year. In a random sample of 547 U.S. employees, 
44 said it is likely they will be laid off in the next year. At a = 0.05, is there 
enough evidence to reject the researcher’s claim? (Adapted from Gallup) 


Section 7.5 


In Exercises 51-54, find the critical value(s) and rejection region(s) for the type of 
chi-square test with sample size n and level of significance a. 

51. Right-tailed test, n = 20, a = 0.05 

52. Two-tailed test, n = 14,a = 0.01 

53. Two-tailed test, n = 41,a = 0.10 

54. Left-tailed test, n = 6,a = 0.05 


In Exercises 55-58, test the claim about the population variance o” or standard 
deviation o at the level of significance a. Assume the population is normally 
distributed. 


55. Claim: 0” > 2;a = 0.10. Sample statistics: s? = 2.95,n = 18 
56. Claim: 0? = 60; a = 0.025. Sample statistics: s* = 72.7,n = 15 
57. Claim: 0 = 1.25; a = 0.05. Sample statistics: s = 1.03,n = 6 
58. Claim: 0 # 0.035; a = 0.01. Sample statistics: s = 0.026, n = 16 


In Exercises 59 and 60, (a) identify the claim and state Hy and H,, (b) find the 
critical value(s) and identify the rejection region(s), (c) find the standardized test 
statistic y’, (d) decide whether to reject or fail to reject the null hypothesis, and 
(e) interpret the decision in the context of the original claim. Assume the population 
is normally distributed. 


59. A bolt manufacturer makes a type of bolt to be used in airtight containers. 
The manufacturer claims that the variance of the bolt widths is at most 0.01. 
A random sample of 28 bolts has a variance of 0.064. At a = 0.005, is there 
enough evidence to reject the claim? 


60. A restaurant claims that the standard deviation of the lengths of serving 
times is 3 minutes. A random sample of 27 serving times has a standard 
deviation of 3.9 minutes. At a = 0.01, is there enough evidence to reject the 
claim? 


61. In Exercise 59, is there enough evidence to reject the claim at the a = 0.01 
level? Explain. 


62. In Exercise 60, is there enough evidence to reject the claim at the a = 0.05 
level? Explain. 


432 CHAPTER 7 Hypothesis Testing with One Sample 


Chapter Quiz 


Take this quiz as you would take a quiz in class. After you are done, check your 


fe) 


rk against the answers given in the back of the book. 


For each exercise, perform the steps below. 


(a) 
(b) 


(c) 


(e) 
1. 


Identify the claim and state Hy and H,. 


Determine whether the hypothesis test is left-tailed, right-tailed, or two-tailed, 
and whether to use a z-test, a t-test, or a chi-square test. Explain your reasoning. 


Choose one of the options. 


Option 1: Find the critical value(s), identify the rejection region(s), and find 
the appropriate standardized test statistic. 


Option 2: Find the appropriate standardized test statistic and the P-value. 
Decide whether to reject or fail to reject the null hypothesis. 
Interpret the decision in the context of the original claim. 
A hat company claims that the mean hat size for a male is at least 7.25. 
A random sample of 12 hat sizes has a mean of 7.15. At a = 0.01, can you 


reject the company’s claim? Assume the population is normally distributed 
and the population standard deviation is 0.27. 


. A travel analyst claims the mean daily base price for renting a full-size or less 


expensive vehicle in Vancouver, Washington, is more than $36. You want to 
test this claim. In a random sample of 40 full-size or less expensive vehicles 
available to rent in Vancouver, Washington, the mean daily base price is $42. 
Assume the population standard deviation is $19. At a = 0.10, do you have 
enough evidence to support the analyst’s claim? (Adapted from Expedia) 


. A government agency reports that the mean amount of earnings for full-time 


workers ages 18 to 24 with a bachelor’s degree in a recent year is $47,254. 
In a random sample of 15 full-time workers ages 18 to 24 with a bachelor’s 
degree, the mean amount of earnings is $50,781 and the standard deviation is 
$5290. At a = 0.05, is there enough evidence to support the claim? Assume 
the population is normally distributed. (Adapted from U.S. Census Bureau) 


Be 4. A weight loss program claims that program participants have a mean 


5. 


weight loss of at least 10.5 pounds after 1 month. The weight losses after 
1 month (in pounds) of a random sample of 40 program participants 
are listed below. At a = 0.01, is there enough evidence to reject the 
program’s claim? 


47 60 72 83 9.2 101 140 11.7 128 10.8 
11.0 72 80 47 11.8 10.7 61 88 7.7 85 
95 102 56 69 7.9 86 105 96 5.7 9.6 
12.6 12.9 68 120 5.1 140 9.7 108 91 12.9 


A nonprofit consumer organization says that less than 18% of the vehicles 
the organization rated in a recent year have an overall score of 78 or more. 
In a random sample of 90 vehicles the organization rated in a recent year, 
20% have an overall score of 78 or more. At a = 0.05, can you support the 
organization’s claim? (Adapted from Consumer Reports) 


. In Exercise 5, the nonprofit consumer organization says that the standard 


deviation of the vehicle rating scores is 11.90. A random sample of 90 vehicle 
rating scores has a standard deviation of 11.96. At a = 0.10, is there enough 
evidence to reject the organization’s claim? Assume the population is 
normally distributed. (Adapted from Consumer Reports) 


Chapter Test 433 


Chapter Test 


Take this test as you would take a test in class. 
For each exercise, perform the steps below. 


(a) Identify the claim and state H, and H,. 


(b) Determine whether the hypothesis test is left-tailed, right-tailed, or two-tailed, 
and whether to use a z-test, a t-test, or a chi-square test. Explain your reasoning. 


(c) Choose one of the options. 


Option 1: Find the critical value(s), identify the rejection region(s), and find 
the appropriate standardized test statistic. 


Option 2: Find the appropriate standardized test statistic and the P-value. 
(d) Decide whether to reject or fail to reject the null hypothesis. 


(e) Interpret the decision in the context of the original claim. 


1. A retail grocery chain owner claims that more than 30% of adults have 
purchased a meal kit in a recent year. In a random sample of 36 adults, 25% 
have purchased a meal kit in a recent year. At a = 0.10, is there enough 
evidence to support the owner’s claim? (Adapted from Harris Interactive) 


2. A travel analyst claims that the mean of the room rates for two adults at 
three-star hotels in Salt Lake City is $134. In a random sample of 37 three-star 
hotels in Salt Lake City, the mean room rate for two adults is $143. Assume 
the population standard deviation is $30. At a = 0.10, is there enough 
evidence to reject the analyst’s claim? (Adapted from Expedia) 


3. A travel analyst says that the mean price of a meal for a family of 4 in a resort 
restaurant is at most $100. A random sample of 33 meal prices for families of 
4 has a mean of $110 and a standard deviation of $19. At a = 0.01, is there 
enough evidence to reject the analyst’s claim? 


4. A research center claims that more than 80% of U.S. adults think that 
mothers should have paid maternity leave. In a random sample of 50 U.S. 
adults, 82% think that mothers should have paid maternity leave. At 
a = 0.05, is there enough evidence to support the center’s claim? (Adapied 
from Pew Research Center) 


5. A nutrition bar manufacturer claims that the standard deviation of the 
number of grams of carbohydrates in a bar is 1.11 grams. A random sample 
of 26 bars has a standard deviation of 1.19 grams. At a = 0.05, is there 
enough evidence to reject the manufacturer’s claim? Assume the population 
is normally distributed. 


6. A nonprofit consumer organization says that the mean price of the vehicles 
the organization rated in a recent year is at least $41,000. In a random sample 
of 150 vehicles the organization rated in a recent year, the mean price is 
$40,600 and the standard deviation is $17,300. At a = 0.01, is there enough 
evidence to reject the organization’s claim? (Adapted from Consumer Reports) 


eB 7. A researcher claims that the mean age of the residents of a small 
town is more than 38 years. The ages (in years) of a random sample 
of 30 residents are listed below. At a = 0.10, is there enough evidence 
to support the researcher’s claim? Assume the population standard 
deviation is 9 years. 


41 44 40 30 29 46 42 53 21 29 43 46 39 35 33 
42 35 43 35 24 21 29 24 25 85 56 82 87 72 31 


Putting it all together 


REAL DECISIONS 


The charts show results of studies on four-year colleges in the United States. 
You want to portray your college in a positive light for an advertising campaign 
designed to attract high school students. You decide to use hypothesis tests to 
show that your college is better than the average in certain aspects. 


EXERCISES 


9 
1. What Would You Test: Freshman retention rate 


What claims could you test if you wanted to convince a student to come 73.9%) 
‘i F ; A 4-year graduation rate 

to your college? Suppose the student you are trying to convince is mainly 

concerned with (a) affordability, (b) having a good experience, and (c) 

graduating and starting a career. List one claim for each case. State the null 

and alternative hypotheses for each claim. 


College Success 


2. Choosing a Random Sample 


Classmates suggest conducting the following sampling techniques to test 0 20 40 60 80 100 
various claims. Determine whether the sample will be random. If not, Percent 
suggest an alternative. 


(a) Survey all the students you have class with and ask about the average 


; ; : ae II 
time they spend daily on different activities. College Cost 
: Annual tuition, public, In-state 
(b) Randomly select former students from a list of recent graduates and 130) 
Annual tuition, public, Out-of-state 


ask whether they are employed. 303) 


Annual tuition, private 


(c) Randomly select students from a directory, ask how much debt money 
they borrowed to pay for college this year, and multiply by four. 


$33,635] 

Amount borrowed 
$29,411) 
Need-based scholarship or grants 


3. Supporting a Claim 


‘You want your test to support a positive claim about your college, not just a py 
fail to reject one. Should you state your claim so that the null hypothesis 0 10,000 20,000 30,000 


contains the claim or the alternate hypothesis contains the claim? Explain. Amount 
4. Testing a Claim 
You want to claim that students at your college graduate with an average Student Daily Life 
debt of less than $25,000. A random sample of 40 recent graduates has a Siosping 
mean amount borrowed of $23,475 and a standard deviation of $8000. At i 38 
a = 0.05, is there enough evidence to support your claim? Leisure ae sports 
5, Testing a Claim el Activities 
You want to claim that your college has a freshmen retention rate of at least ae 
80%. You take a random sample of 60 of last year’s freshmen and find that eaveling 
54 of them still attend your college. At a = 0.05, is there enough evidence Dining 
to reject your claim? [1.0 
Other __ 
6. Conclusion [30] : 
EEE 
Test one of the claims you listed in Exercise 1 and interpret the results. 0 2 4 6 8 10 


Discuss any limits of your sampling process. Average (in hours) 


434 CHAPTER 7 Hypothesis Testing with One Sample 


TECHNOLOGY 


TI-84 PLUS 


The Case of the Vanishing Women 


53% => 29% a> 9% E> 0% 


From 1966 to 1968, Dr. Benjamin Spock and others were 
tried for conspiracy to violate the Selective Service Act by 
encouraging resistance to the Vietnam War. By a series of 
three selections, no women ended up being on the jury. 
In 1969, Hans Zeisel wrote an article in The University 
of Chicago Law Review using statistics and hypothesis 
testing to argue that the jury selection was biased against 
Dr. Spock. Dr. Spock was a well-known pediatrician 
and author of books about raising children. Millions of 
mothers had read his books and followed his advice. 
Zeisel argued that, by keeping women off the jury, the 
court prejudiced the verdict. 

The jury selection process for Dr. Spock’s trial is 
shown at the right. 


1. The Minitab display below shows a hypothesis test 
for a claim that the proportion of women in the 
city directory is p = 0.53. In the test, m = 350 and 
p ~ 0.291. Should you reject the claim? What is the 
level of significance? Explain. 


2. In Exercise 1, you rejected the claim that p = 0.53. 
But this claim was true. What type of error is this? 


3. When you reject a true claim with a level of significance 
that is virtually zero, what can you infer about the 
randomness of your sampling process? 


Test and Cl for One Proportion 


Test of p=O0.53vsp # 0.58 


Sample x N Sample p 
il 102 350 0.291429 
Using the normal approximation. 


(0.228862, 0.353995) 


Stage 1. The clerk of the Federal District Court selected 
350 people “at random” from the Boston City Directory. 
The directory contained several hundred names, 53% of 
whom were women. However, only 102 of the 350 people 
selected were women. 


Stage 2. The trial judge, Judge Ford, selected 100 people 
“at random” from the 350 people. This group was called 
a venire and it contained only nine women. 


Stage 3. The court clerk assigned numbers to the members 
of the venire and, one by one, they were interrogated by 
the attorneys for the prosecution and defense until 12 
members of the jury were chosen. At this stage, only 
one potential female juror was questioned, and she 
was eliminated by the prosecutor under his quota of 
peremptory challenges (for which he did not have to give 
a reason). 


4. Describe a hypothesis test for Judge Ford’s “random” 
selection of the venire. Use a claim of 
102 
= ~~ = 0.291. 
P* 350 
(a) Write the null and alternative hypotheses. 
(b) Use technology to perform the test. 
(c) Make a decision. 
(d) Interpret the decision in the context of the original 
claim. Could Judge Ford’s selection of 100 venire 
members have been random? 


99 % Cl Z-Value 


—8.94 


P-Value 
0.000 


Extended solutions are given in the technology manuals that accompany this text. 


Technical instruction is provided for Minitab, Excel, and the TI-84 Plus. 


Technology 435 


Using Technology to Perform Hypothesis Tests 


Here are some Minitab and TI-84 Plus printouts for some of the examples in this chapter. 


Display Descriptive Statistics... 
Store Descriptive Statistics... 


Graphical Summary... 


1-Sample Z... 


41-Sample t... 
2-Sample t... 
Paired t... 


1 Proportion... 
2 Proportions... 


Display Descriptive Statistics... 
Store Descriptive Statistics... 


Graphical Summary... 


1-Sample Z... 


1-Sample t... 


2-Sample t... 
Paired t... 


1 Proportion... 
2 Proportions... 


Display Descriptive Statistics... 
Store Descriptive Statistics... 


Graphical Summary... 


1-Sample Z... 
41-Sample t... 
2-Sample t... 
Paired t... 


2 Proportions... 


See Example 5, page 389. 


One-Sample Z 


Test of u = 68.3 vs # 68.3 
The assumed standard deviation = 3.5 


N Mean SE Mean 95% Cl Hd. P 
25 67.200 O.7/ce) es.s2is), Sshinv2) —Nl7 1168 


See Example 4, page 402. 


One-Sample T 


Test of u = 21000 vs < 21000 


N Mean StDev SE Mean 95% Upper Bound ile P 
14 19189 2950 788 eds =2.30 Cols) 


See Example 2, page 412. 


Test and Cl for One Proportion 


Test of p =0.51 vsp ¥ 0.51 


Sample x N Sample p 90% Cl ZValue P-Value 
1 1161 2202 0.527248 (0.509748, 0.544748) 1.62 0.105 
Using the normal approximation. 


436 CHAPTER 7 Hypothesis Testing with One Sample 


See Example 9, page 393. 


TI-84 PLUS 


EDIT CALC Mita) 


Z=lcsie 
T—Test... 


2—PropZTest... 


NV Zinterval... 


TI-84 PLUS 


Z-Testi 


Inpt:Data 
H9:88200 
o:9500 
x:85900 

n:20 

H'#Ug >Uo 
Calculate Draw 


TI-84 PLUS 


Z-Testi 


u<88200 

Z=—"| (E2727 352 
p=.1394646984 
x=85900 

n=20 


TI-84 PLUS 


3: 2—SampZtTest... 
4: 2—SamptTtest... 
5: 1—PropZtlest... 
6 
7, 


See Example 5, page 403. 


TI-84 PLUS 


EDIT CALC Mistenis} 


1: Z-Test... 
Tlestes 


3: 2-SampZTest... 
4: 2—SamptTtest... 
5: 1—PropZiest... 
6: 2—PropZTest... 
7, 


NV Zinterval... 


TI-84 PLUS 


T-Test 
Inpt:Data 
Up:6.8 
x8, 7/ 
So.85) 
meres) 


U Bah] <Uo >Uo 
Calculate Draw 


TI-84 PLUS 


T-Test 


u+6.8 
t=—1.784285142 
p=.0823638462 
wets}, 7/ 

Sec) 

n=39 


u 


TI-84 PLUS 


t=-1.7843 


See Example 1, page 411. 


TI-84 PLUS 


EDIT CALC Mita} 


1: Z-Test... 

2: T—Test... 

3: 2-SampZTest... 
4: 2—SampTtest... 
1—PropZTest... 
6: 2—PropZTest... 
7 Zinterval... 


TI-84 PLUS 


1-PropZTest 
pp:-45 
x:41 
n:100 


prop#Pp >Po 
Calculate Draw 


TI-84 PLUS 


1-PropZTest 


prop<.45 
z=—.8040302522 
p=.2106896879 
p=.41 

n=100 


u 


TI-84 PLUS 


Using Technology to Perform Hypothesis Tests 


437 


CHAPTER 


hypothesis testing 
with Iwo Samples 


According to a study published in the Journal of General Internal Medicine, 50% of yoga 
users are college-educated while only 23% of non-yoga users are college-educated. 


438 


8.1 


Testing the Difference Between 
Means (Independent Samples, 
ao and o-, Known) 


8.2 


Testing the Difference Between 
Means (Independent Samples, 
o-, and o-, Unknown) 


Case Study 


8.3 


Testing the Difference Between 
Means (Dependent Samples) 


8.4 


Testing the Difference 
Between Proportions 


Uses and Abuses 
Real Statistics— Real Decisions 
Technology 


wy Where You ve Been 


In Chapter 6, you were introduced to inferential statistics 
and you learned how to form a confidence interval to 
estimate a population parameter. Then, in Chapter 7, you 
learned how to test a claim about a population parameter, 
basing your decision on sample statistics and their sampling 
distributions. 


Using data from the National Health Interview Survey, a 
study was conducted to analyze the characteristics of yoga 
users and non-yoga users. The study was published in the 
Journal of General Internal Medicine. Some of the results 
are shown below for a random sample of yoga users. 


Yoga Users (n = 1593) 


Characteristic Frequency Proportion 
40 to 49 years old 367 0.2304 
Income of $20,000 to $34,999 239 0.1500 
Non-smoking 1323 0.8305 


ux Where You re Going 


In this chapter, you will continue your study of inferential 
statistics and hypothesis testing. Now, however, instead 
of testing a hypothesis about a single population, you 
will learn how to test a hypothesis that compares two 
populations. 


For instance, in the yoga study, a random sample of 
non-yoga users was also surveyed. Here are the study’s 
findings for this second group. 


Non-Yoga Users (n = 29,948) 


Characteristic 

40 to 49 years old 

Income of $20,000 to $34,999 
Non-smoking 


From these two samples, can you conclude that there is a 
difference in the proportion of 40- to 49-year-olds, people 
with an income of $20,000 to $34,999, or non-smokers 
between yoga users and non-yoga users? Or, might the 
differences in the proportions be due to chance? 


Frequency Proportion 
6,290 0.2100 
5,990 0.2000 

23,360 0.7800 


In this chapter, you will learn to answer these questions by 
testing the hypothesis that the two proportions are equal. 
For instance, for non-smokers, you can conclude that the 
proportion of yoga users is different from the proportion of 
non-yoga users. 


439 


440 


What You Should Learn 


~ How to determine whether two 
samples are independent or 
dependent 


WwW 


~ An introduction to two-sample 
hypothesis testing for the 
difference between two 
population parameters 


~ How to perform a two-sample 
z-test for the difference between 
two means 1; and py» using 
independent samples with o, 
and a known 


Independent Samples 


Sample 1 Sample 2 


Dependent Samples 


Sample 1 


Sample 2 


Study Tip 


Dependent samples often 
involve before and after 
results for the same 
person or object (Such as 
a person's weight before 
= ~~ starting a diet and after 
6 weeks), or results of individuals 
matched for specific characteristics 
(such as identical twins). 


Testing the Difference Between Means 
(Independent Samples, o-, and o-, Known) 


CHAPTER 8 Hypothesis Testing with Two Samples 


Independent and Dependent Samples = An Overview of Two-Sample 
Hypothesis Testing m Two-Sample z-Test for the Difference Between Means 


Independent and Dependent Samples 


In Chapter 7, you studied methods for testing a claim about the value of 
a population parameter. In this chapter, you will learn how to test a claim 
comparing parameters from two populations. Before learning how to test the 
difference between two parameters, you need to understand the distinction 
between independent samples and dependent samples. 


DEFINITION 


Two samples are independent when the sample selected from one population 
is not related to the sample selected from the second population (see top 


figure at the left). Two samples are dependent when each member of one 
sample corresponds to a member of the other sample (see bottom figure at the 
left). Dependent samples are also called paired samples or matched samples. 


Independent and Dependent Samples 
Classify each pair of samples as independent or dependent. 


1. Sample 1: Triglyceride levels of 70 patients 
Sample 2: Triglyceride levels of the same 70 patients after using a 
triglyceride-lowering drug for 6 months. 


2. Sample 1: Scores for 38 adult males on a psychological screening test for 
attention-deficit/hyperactivity disorder 

Sample 2: Scores for 50 adult females on a psychological screening test for 
attention-deficit/hyperactivity disorder 


SOLUTION 


1. These samples are dependent. Because the triglyceride levels of the same 
patients are taken, the samples are related. The samples can be paired with 
respect to each patient. 


2. These samples are independent. It is not possible to form a pairing between 
the members of samples, the sample sizes are different, and the data 
represent scores for different individuals. 


TRY IT YOURSELF 1 
Classify each pair of samples as independent or dependent. 


1. Sample 1: Systolic blood pressures of 30 adult females 
Sample 2: Systolic blood pressures of 30 adult males 
2. Sample 1: Midterm exam scores of 14 chemistry students 
Sample 2: Final exam scores of the same 14 chemistry students 
Answer: Page A37 


SECTION 8.1 


Study Tip 


In the figures at the right, 
the members in the two 


samples, adults ages 18 to 
34 and adults ages 35 to 
49, are not matched or 
paired, so the samples 

are independent. 


Testing the Difference Between Means (Independent Samples, o; and a Known) 441 


An Overview of Two-Sample Hypothesis Testing 


In this section and the next, you will learn how to test a claim comparing the 
means of two different populations using independent samples. 

For instance, an advertiser is developing a marketing plan and wants to 
determine whether there is a difference in the amounts of time adults ages 18 
to 34 and adults ages 35 to 49 spend on social media each day. The only way 
to conclude with certainty that there is a difference is to take a census of all 
adults in both age groups, calculate their mean daily times spent on social media, 
and find the difference. Of course, it is not practical to take such a census. 
However, it is possible to determine with some degree of certainty whether such 
a difference exists. 

To determine whether a difference exists, the advertiser begins by assuming 
that there is no difference in the mean times of the two populations. That is, 


M1 — po = 0. Assume there is no difference. 


Then, by taking a random sample from each population, a two-sample hypothesis 
test is performed using the test statistic 

X, — X, = 0. Test statistic 

The advertiser obtains the results shown in the next two figures. 


Adults 18 to 34 Adults 35 to 49 


X, = 59 min 
Sy = 15 min 
Ny = 150 


Adults 35 to 49 that 
are not in the sample 


The figure below shows the sampling distribution of x; — x, for many similar 
samples taken from two populations for which uw, — pz = 0. The figure also 
shows the test statistic and the standardized test statistic. 


Sampling Distribution 


Test statistic: x, — x, =55-59=—4 
<—|—e—_ +—_ ++ +—}—_}+—_ + 
5 -4 -3 2 -1 0 tf 2 3 @ 5 


e Abe Difference in sample means (in minutes 
Standardized test statistic: t ~ —2.612_ P ( ) 
~< e 


t t t t +-—> t 


—3 -2 -1 0 1 2 3 


From the figure, you can see that it is quite unlikely to obtain sample means 
that differ by 4 minutes assuming the actual difference is 0. The difference of the 
sample means would be more than 2.5 standard errors from the hypothesized 
difference of 0! Performing a two-sample hypothesis test using a level of 
significance of a = 0.10, the advertiser can conclude that there is a difference in 
the amounts of time adults ages 18 to 34 and adults ages 35 to 49 spend on social 
media each day. 


442 


You can also write the null 
and alternative hypotheses 
as shown below. 


hes — p2 #0 


— po. >0 


Sampling Distribution 
for Xy- X 
———~ F%,-%, 
| “ 


Hy— Hy 


CHAPTER 8 Hypothesis Testing with Two Samples 


It is important to remember that when you perform a two-sample hypothesis 
test using independent samples, you are testing a claim concerning the difference 
between the parameters in two populations, not the values of the parameters 
themselves. 


DEFINITION 


For a two-sample hypothesis test with independent samples, 


1. the null hypothesis Hp is a statistical hypothesis that usually states there 
is no difference between the parameters of two populations. The null 
hypothesis always contains the symbol <=, =, or =. 


2. the alternative hypothesis H, is a statistical hypothesis that is true when 
Hp is false. The alternative hypothesis contains the symbol >, #, or <. 


To write the null and alternative hypotheses for a two-sample hypothesis 
test with independent samples, translate the claim made about the population 
parameters from a verbal statement to a mathematical statement. Then, write 
its complementary statement. For instance, for a claim about two population 
parameters p, and pz, some possible pairs of null and alternative hypotheses are 


i M1 = M2 M1 = M2 and i M1 = M2 
Ay: pa FA By Ag: ea > a’ Aig: jy < Ba 


Regardless of which hypotheses you use, you always assume there is no difference 
between the population means (1; = p12). 


Two-Sample z-Test for the Difference Between 
Means 
In the remainder of this section, you will learn how to perform a z-test for the 


difference between two population means p, and wu, when the samples are 
independent. These conditions are necessary to perform such a test. 


1. The population standard deviations are known. 

2. The samples are randomly selected. 

3. The samples are independent. 

4. The populations are normally distributed or each sample size is at least 30. 


When these conditions are met, the sampling distribution for +, — X,, the 
difference of the sample means, is a normal distribution with mean and standard 
error as shown in the table below and the figure at the left. 


In Words In Symbols 


The mean of the difference of the sample 
means is the assumed difference between 
the two population means. When no 
difference is assumed, the mean is 0. 


Mean = pg, -x, 
= He, ~ Be, 
= Mi ~ be 


The variance of the sampling distribution 

is the sum of the variances of the individual 
sampling distributions for x, and x. The 
standard error is the square root of the 2 2 
sum of the variances. = 


Standard error = o¥,-3, 


SECTION 8.1 


oon 


er O | | 
ee) Picturing 
the World 


There are about 110,799 public 
elementary and secondary school 
teachers in Georgia and about 
107,385 in Ohio. In a survey, 

200 public elementary and 
secondary school teachers in 
each state were asked to report 
their salary. The results are shown 
below. It is claimed that the mean 
salary in Ohio is greater than the 
mean salary in Georgia. (Source: 
National Education Association) 


Georgia 
xX; = $53,375 
n, = 200 


Ohio 
X>. = $56,150 
n> = 200 


Determine a null hypothesis and 
alternative hypothesis for this 
claim. 


443 


Testing the Difference Between Means (Independent Samples, a and a Known) 


When the conditions on the preceding page are met and the sampling 
distribution for x, — X) is a normal distribution, you can use the z-test to test 
the difference between two population means p; and p>. The standardized test 
statistic takes the form of 

(Observed difference ) — (Hypothesized difference ) 


Z= . 
Standard error 


As you read the definition and guidelines for a two-sample z-test, note that if the 
null hypothesis states wy = po, My S M2, OF fy = Mo, then uw, = py is assumed 
and the expression p14; — [2 is equal to 0. 


Two-Sample z-Test for the Difference Between Means 


A two-sample z-test can be used to test the difference between two population 
means p; and yy when these conditions are met. 


1. Both o; and op are known. 
2. The samples are random. 
3. The samples are independent. 
4. The populations are normally distributed or both n; = 30 and nz = 30. 
The test statistic is x, — X>. The standardized test statistic is 

_ (%1 = X2) — (41 = ba) 


OX, —% 


where o3,-z, = 


GUIDELINES 


Using a Two-Sample z-Test for the Difference Between Means 
(Independent Samples, o; and 0, Known) 


In Words 


. Verify that a, and o> are known, the 
samples are random and independent, 
and either the populations are normally 
distributed or both n; = 30 and n, = 30. 

. State the claim mathematically 
and verbally. Identify the null 
and alternative hypotheses. 


In Symbols 


State Hp and H,. 


. Specify the level of significance. Identify a. 


. Determine the critical value(s). Use Table 4 in Appendix B. 
. Determine the rejection region(s). 


. Find the standardized test statistic Z= 
and sketch the sampling distribution. Om1—% 


(%1 — X2) — (M1 — M2) 


. Make a decision to reject or fail to 
reject the null hypothesis. 


If z is in the rejection region, 
then reject Hp. Otherwise, 
fail to reject Hp. 


. Interpret the decision in the context 
of the original claim. 


A hypothesis test for the difference between means can also be performed 
using P-values. Use the guidelines above, skipping Steps 4 and 5. After finding 
the standardized test statistic, use Table 4 in Appendix B to calculate the 
P-value. Then make a decision to reject or fail to reject the null hypothesis. If P 
is less than or equal to a, then reject Hp. Otherwise, fail to reject Hp. 


444 CHAPTER 8 Hypothesis Testing with Two Samples 


Sample Statistics for 


Credit Card Debt 
California Florida 
xX, = $3060 xX, = $2910 
ny = 250 Ng = 250 


2) =-1.96 


0 1 2 3 
z=185 z= 1.96 


See TI-84 Plus 
steps on page 487. 


A Two-Sample z-Test for the Difference Between Means 


A credit card watchdog group claims that there is a difference in the mean 
credit card debts of people in California and Florida. The results of a random 
survey of 250 people from each state are shown at the left. The two samples are 
independent. Assume that 0, = $960 for California and 0, = $845 for Florida. 
Do the results support the group’s claim? Use a = 0.05. (Adapted from Federal 
Reserve Bank of New York) 


SOLUTION 


Note that o, and a, are known, the samples are random and independent, and 
both 7, and n, are at least 30. So, you can use the z-test. The claim is “there is 
a difference in the mean credit card debts of people in California and Florida.” 
So, the null and alternative hypotheses are 


Ao: by = po and Aly: fy A Bo, (Claim) 


Because the test is a two-tailed test and the level of significance is a = 0.05, 
the critical values are —z) = —1.96 and z) = 1.96. The rejection regions are 
zZ < —1.96 and z > 1.96. The standardized test statistic is 


LZ Use the z-test. 
oT | o 
— + — 
nm M2 
(3060 — 2910) — 0 
= Assume p41 = M2, SO fy — Po = 0. 
960° . 845? 
250 250 
= 1.85. Round to two decimal places. 


The figure at the left shows the location of the rejection regions and the 
standardized test statistic z. Because z is not in the rejection region, you fail to 
reject the null hypothesis. 


Interpretation There is not enough evidence at the 5% level of significance 
to support the group’s claim that there is a difference in the mean credit card 
debts of people in California and Florida. 


TRY IT YOURSELF 2 


A survey indicates that the mean annual wages for forensic science technicians 
working for local and state governments are $60,680 and $59,430, respectively. 
The survey includes a randomly selected sample of size 100 from each 
government branch. Assume that the population standard deviations are 
$6200 (local) and $5575 (state). The two samples are independent. At 
a = 0.10, is there enough evidence to conclude that there is a difference in the 
mean annual wages? (Adapted from U.S. Bureau of Labor Statistics) 

Answer: Page A37 


In Example 2, you can also use a P-value to perform the hypothesis test. For 


instance, the test is a two-tailed test, so the P-value is equal to twice the area to 
the right of z = 1.85, or 


P = 2(1 — 0.9678) = 2(0.0322) = 0.0644. 


Because 0.0644 > 0.05, you fail to reject Hp. 


Sample Statistics for 
Daily Cost of Meals and 
Lodging for Two Adults 


Texas Virginia 


ny = 25 ny = 20 


Tech Tip 
Note that the 


Testing the Difference Between Means (Independent Samples, «7, and o> Known) 445 


Using Technology to Perform a Two-Sample z-Test 


A travel agency claims that the average daily cost of meals and lodging for 
vacationing in Texas is less than the average daily cost in Virginia. The table at 
the left shows the results of a random survey of vacationers in each state. The 
two samples are independent. Assume that o, = $20 for Texas and 0, = $25 
for Virginia, and that both populations are normally distributed. At a = 0.01, 
is there enough evidence to support the claim? (Adapted from American 
Automobile Association) 


SOLUTION 


Note that o, and a, are known, the samples are random and independent, and 
the populations are normally distributed. So, you can use the z-test. The claim 
is “the average daily cost of meals and lodging for vacationing in Texas is less 
than the average daily cost in Virginia.” So, the null and alternative hypotheses 
are Hp: wy, = pw and H,: pw, < pm (claim). The top two displays show how to 
set up the hypothesis test using a TI-84 Plus. The remaining displays show the 
results of selecting Calculate or Draw. 


TI-84 PLUS TI-84 PLUS 
2-SampZTest 2-SampZ Testi 

Inpt:Data No2:25 

o1:20 X1:245 

o2:25 n1:25 

X1:245 x2:251 

aM ese) n2:20 

x2:251 U1:4y2 >ye 
Vn2:20 Calculate Draw 


TI-84 PLUS 


2-SampZTest 
Hi<He 


TI-84 PLUS 


TI-84 Plus displays 
P = 0.1914. Because 

P >a, you fail to reject 
the null hypothesis. 


z=—.8728715609 
p=.1913665007 
X,=245 

%p=251 


Sample Statistics for 
Daily Cost of Meals and 
Lodging for Two Adults 


Alaska Colorado 


ny = 15 ny = 20 


Vn,=25 


Because the test is a left-tailed test and a = 0.01, the rejection region is 
zZ < —2.33. The standardized test statistic z ~ —0.87 is not in the rejection 
region, so you fail to reject the null hypothesis. 


Interpretation There is not enough evidence at the 1% level of significance 
to support the travel agency’s claim. 


TRY IT YOURSELF 3 


A travel agency claims that the average daily cost of meals and lodging for 
vacationing in Alaska is greater than the average daily cost in Colorado. The 
table at the left shows the results of a random survey of vacationers in each 
state. The two samples are independent. Assume that 0, = $25 for Alaska and 
02 = $20 for Colorado, and that both populations are normally distributed. 
At a = 0.05, is there enough evidence to support the claim? (Adapted from 
American Automobile Association) Answer: Page A37 


446 CHAPTER 8 Hypothesis Testing with Two Samples 


B.1 EXERCISES reer Sas 


Building Basic Skills and Vocabulary 


1. What is the difference between two samples that are dependent and two 
samples that are independent? Give an example of each. 


2. Explain how to perform a two-sample z-test for the difference between two 
population means using independent samples with o, and oy known. 


3. Describe another way you can perform a hypothesis test for the difference 
between the means of two populations using independent samples with o; 
and a known that does not use rejection regions. 


4. What conditions are necessary in order to use the z-test to test the difference 
between two population means? 


Independent and Dependent Samples = /n Exercises 5—8, classify the two 
samples as independent or dependent and justify your answer. 


5. Sample 1: The maximum bench press weights for 53 football players 
Sample 2: The maximum bench press weights for the same 53 football 
players after completing a weight lifting program 


6. Sample 1: The IQ scores of 60 females 
Sample 2: The IQ scores of 60 males 


7. Sample 1: The average speed of 23 powerboats using an old hull design 
Sample 2: The average speed of 14 powerboats using a new hull design 


8. Sample 1: The commute times of 10 workers when they use their own 
vehicles 
Sample 2: The commute times of the same 10 workers when they use 
public transportation 


In Exercises 9 and 10, use the TI-84 Plus display to make a decision to reject or 
fail to reject the null hypothesis at the level of significance. Make your decision 
using the standardized test statistic and using the P-value. Assume the sample sizes 
are equal. 


9. a = 0.05 10. a = 0.01 


e-saMre Test f-SdMmPe Test 


bavthe ba Phe 
z=? Fob485408 z=1 » 941656865 


+rhi=128 


In Exercises 11-14, test the claim about the difference between two population 
means [1, and jy at the level of significance a. Assume the samples are random and 
independent, and the populations are normally distributed. 


11. Claim: pw, = p23 a = 0.1 
Population statistics: 0, = 3.4 and 0) = 1.5 
Sample statistics: x; = 16,n, = 29 and xX, = 14,n,) = 28 


SECTION 8.1 


Testing the Difference Between Means (Independent Samples, a and a Known) 447 


12. 


13. 


14. 


Claim: uw, > p23; a = 0.10 
Population statistics: 0, = 40 and a, = 15 
Sample statistics: x, = 500, , = 100 and x, = 495, n, = 75 


Claim: uw, < py; a = 0.05 
Population statistics: 0, = 75 and 0, = 105 
Sample statistics: ¥, = 2435, n,; = 35 and xX, = 2432, n, = 90 


Claim: pw, = po; a = 0.03 
Population statistics: 0, = 136 and oy = 215 
Sample statistics: x, = 5004, n, = 144 and x, = 4895, n, = 156 


Using and Interpreting Concepts 


Testing the Difference Between Two Means In Exercises 15-24, 
(a) identify the claim and state Ho, and H,, (b) find the critical value(s) and identify 
the rejection region(s), (c) find the standardized test statistic z, (d) decide whether 
to reject or fail to reject the null hypothesis, and (e) interpret the decision in the 
context of the original claim. Assume the samples are random and independent, 
and the populations are normally distributed. 


15. 


16. 


17. 


18. 


Braking Distances To compare the dry braking distances from 60 to 0 
miles per hour for two makes of automobiles, a safety engineer conducts 
braking tests for 23 models of Make A and 24 models of Make B. The mean 
braking distance for Make A is 137 feet. Assume the population standard 
deviation is 5.5 feet. The mean braking distance for Make B is 132 feet. 
Assume the population standard deviation is 6.7 feet. At a = 0.10, can the 
engineer support the claim that the mean braking distances are different for 
the two makes of automobiles? (Source: Consumer Reports) 


Digital Gear Shopping To compare customer satisfaction with holiday 
gift purchases of digital gear from online and walk-in retailers, a researcher 
randomly selects 30 customer ratings of online retailers and 31 customer 
ratings of walk-in retailers. The mean customer rating of online retailers is 
90 out of 100. Assume the population standard deviation is 3.4. The mean 
customer rating of walk-in retailers is 88 out of 100. Assume the population 
standard deviation is 3.5. At a = 0.01, can the researcher support the claim 
that the mean customer rating of online retailers is greater than the mean 
customer rating of walk-in retailers? (Source: Consumer Reports) 


Rainfall A non-governmental organization wants to choose between 
two regions in a state to initiate a campaign for rainwater harvesting. A 
researcher claims that Region A receives lesser rainfall than Region B. To 
test the regions, the average rainfall is calculated for 60 days in each region. 
The mean rainfall in Region A is 700 millimeters. Assume the population 
standard deviation is 60 millimeters. The mean rainfall in Region B is 
725 millimeters. Assume the population standard deviation is 66 millimeters. 
At a = 0.01, can the organization support the researcher’s claim? 


Running Costs: Cars You want to buy a car, and a salesperson tells you that 
the mean running costs for Model A and Model B are equal. You research the 
running costs. The mean running cost of 24 Model A cars is $5 per kilometer. 
Assume the population standard deviation is $1.50. The mean running cost 
of 26 Model B cars is $6.5 per kilometer. Assume the population standard 
deviation is $2.50. At a = 0.05, can you reject the salesperson’s claim? 


448 


CHAPTER 8 Hypothesis Testing with Two Samples 


19. 


20. 


21. 


22. 


ACT Mathematics and Science Scores The mean ACT mathematics score 
for 60 high school students is 20.6. Assume the population standard deviation 
is 5.4. The mean ACT science score for 75 high school students is 20.8. Assume 
the population standard deviation is 5.6. At a = 0.01, can you reject the claim 
that ACT mathematics and science scores are equal? (Source: ACT, Inc.) 


ACT English and Reading Scores The mean ACT English score for 
120 high school students is 20.1. Assume the population standard deviation 
is 6.8. The mean ACT reading score for 150 high school students is 21.3. 
Assume the population standard deviation is 6.5. At a = 0.10, can you 
support the claim that ACT reading scores are higher than ACT English 
scores? (Source: ACT, Inc.) 


Home Prices A real estate agency says that the mean home sales price in 
Casper, Wyoming, is the same as in Cheyenne, Wyoming. The mean home 
sales price for 25 homes in Casper is $294,220. Assume the population 
standard deviation is $135,387. The mean home sales price for 25 homes 
in Cheyenne is $287,984. Assume the population standard deviation is 
$151,996. At a = 0.01, is there enough evidence to reject the agency’s claim? 
(Adapted from RealtyTrac) 


Home Prices Refer to Exercise 21. Two more samples are taken, one from 
Casper and one from Cheyenne. For 50 homes in Casper, ¥, = $231,581. 
For 50 homes in Cheyenne, ¥, = $315,706. Use a = 0.01. Do the new 
samples lead to a different conclusion? (Adapted from RealtyTrac) 


lad} 23. Precipitation A climatologist claims that the precipitation in Seattle, 


Washington, was greater than in Birmingham, Alabama, in a recent 
year. The daily precipitation amounts (in inches) for 30 days in a recent 
year in Seattle are shown below. Assume the population standard 
deviation is 0.24 inch. 


0.00 0.33 0.00 0.80 0.47 0.00 0.18 0.00 0.00 0.00 
0.01 0.01 0.01 0.12 0.00 0.00 0.00 0.00 0.00 0.00 
0.17 0.00 0.00 0.00 0.61 1.19 0.00 0.00 0.00 0.81 


The daily precipitation amounts (in inches) for 30 days in a recent year 
in Birmingham are shown below. Assume the population standard 
deviation is 0.33 inch. 


0.00 0.00 0.00 0.01 0.00 0.00 0.00 1.82 0.10 0.00 
0.00 0.00 0.00 0.00 0.00 0.32 0.01 0.15 0.00 0.01 
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 


At a = 0.05, can you support the climatologist’s claim? (Source: NOAA) 


BG 24. Temperature A climatologist claims that the temperature in Seattle, 


Washington, was lower than in Birmingham, Alabama, in a recent year. 
The maximum daily temperatures (in degrees Fahrenheit) for 30 days 
in a recent year in Seattle are shown below. Assume the population 
standard deviation is 13.6°F. 


51 49 47 48 49 50 75 61 87 72 62 72 84 75 68 
73 72 92 64 72 66 59 59 61 57 57 45 46 46 37 


The maximum daily temperatures (in degrees Fahrenheit) for 30 days in 
a recent year in Birmingham are shown below. Assume the population 
standard deviation is 15.4°F. 


43 62 61 71 54 69 79 79 84 82 81 84 90 97 95 
89 95 93 94 94 87 84 81 72 72 69 66 37 47 64 


At a = 0.01, can you support the climatologist’s claim? (Source: NOAA) 


SECTION 8.1 


Entry level software engineers 
in Raleigh, NC 


¥, = $64,270 
ny= 42 


Entry level software engineers 
in Wichita, KS 


X= $62,610 


n, = 38 
2 Wichita® 


FIGURE FOR EXERCISE 27 


449 


Testing the Difference Between Means (Independent Samples, a and a Known) 


25. Getting at the Concept Explain why the null hypothesis Ho: uw, = po is 
equivalent to the null hypothesis Ho: uw; — 2 = 0. 


26. Getting at the Concept Explain why the null hypothesis Ho: uw; = po is 
equivalent to the null hypothesis Ho: uw, — pb. = 0. 


Extending Concepts 


Testing a Difference Other Than Zero Sometimes a researcher is 
interested in testing a difference in means other than zero. In Exercises 27 and 
28, you will test the difference between two means using a null hypothesis of 
Alo: by — Po = k, Ho: wy — bo = k, or Ao: wy — by = k. The standardized test 
statistic is still 


_ (% — X2) — (H1 = Be) _ fot, 3 
ZL where 7 Ee a a 
OF, -% ny n2 


27. Software Engineer Salaries Is the difference between the mean annual 
salaries of entry level software engineers in Raleigh, North Carolina, 
and Wichita, Kansas, more than $2000? To decide, you select a random 
sample of entry level software engineers from each city. The results of each 
survey are shown in the figure at the left. Assume the population standard 
deviations are a; = $10,850 and oy = $10,970. At a = 0.05, what should 
you conclude? (Adapted from Salary.com) 


28. Architect Salaries Is the difference between the mean annual salaries of 
entry level architects in Denver, Colorado, and Los Angeles, California, 
equal to $10,000? To decide, you select a random sample of entry level 
architects from each city. The results of each survey are shown in the figure. 
Assume the population standard deviations are 0, = $6520 and a, = $7130. 
At a = 0.01, what should you conclude? (Adapted from Salary.com) 


Entry level Entry level 
architects in oo architects in 
Denver,CO aoe Los Angeles, CA 
X, = $50,410 | X, = $54,640 

n, = 32 Ny = 30 


Constructing Confidence Intervals for , — p2 You can construct a 
confidence interval for the difference between two population means ju, — p2, as 
shown below, when both population standard deviations are known, and either 
both populations are normally distributed or both n, = 30 and nz = 30. Also, the 
samples must be randomly selected and independent. 


i] ¥] 3 ¥] 
= = OF 02 = = OT 02 
(Xa Way) oe ae Ha < (X, — X2) + 2%, aT 


In Exercises 29 and 30, construct the indicated confidence interval for [41 — [. 


29. Software Engineer Salaries Construct a 95% confidence interval for 
the difference between the mean annual salaries of entry level software 
engineers in Raleigh, North Carolina, and Wichita, Kansas, using the data 
from Exercise 27. 


30. Architect Salaries Construct a 99% confidence interval for the difference 
between the mean annual salaries of entry level architects in Denver, 
Colorado, and Los Angeles, California, using the data from Exercise 28. 


450 CHAPTER 8 _ Hypothesis Testing with Two Samples 


8.2 


What You Should Learn 


» How to perform a two-sample 
t-test for the difference between 
two means p, and py using 
independent samples with o, 
and o» unknown 


a Study Tip 


To perform the two-sample 


ttest described at the right, 


you will need to know 

whether the variances of 

two populations are equal. 

In this chapter, each 
example and exercise will state 
whether the variances are equal. 
You will learn to test for differences 
between two population variances in 
Chapter 10. 


Testing the Difference Between Means 
(Independent Samples, o-, and o-, Unknown) 


The Two-Sample t-Test for the Difference Between Means 


The Two-Sample f-Test for the Difference 
Between Means 


In Section 8.1, you learned how to test the difference between means when both 
population standard deviations are known. In many real-life situations, both 
population standard deviations are not known. In this section, you will learn how 
to use a f-test to test the difference between two population means py and po 
using independent samples from each population when o, and 0, are unknown. 
These conditions are necessary to perform such a test: (1) the population standard 
deviations are unknown, (2) the samples are randomly selected, (3) the samples 
are independent, and (4) the populations are normally distributed or each sample 
size is at least 30. When these conditions are met, the sampling distribution for the 
difference between the sample means xX; — X2 is approximated by a t-distribution 
with mean p, — po. So, you can use a two-sample f-test to test the difference 
between the population means jp; and jo. The standard error and the degrees of 
freedom of the sampling distribution depend on whether the population variances 
oj and 03 are equal, as shown in the next definition. 


Two-Sample t-Test for the Difference Between Means 


A two-sample f-test is used to test the difference between two population 
means py; and uw, when (1) o; and o» are unknown, (2) the samples are 
random, (3) the samples are independent, and (4) the populations are 
normally distributed or both n, = 30 and n, = 30. The test statistic is 
X, — X>, and the standardized test statistic is 


_ y=) = Ga Ho) 


X1—X2 


Variances are equal: If the population variances are equal, then information 
from the two samples is combined to calculate a pooled estimate of the 
standard deviation G. 


—_ {2 — 1)s? + (nm) — 1)s3 


ny t+ ny —- 2 
The standard error for the sampling distribution of x, — X, is 


1 1 
— + — 


Variances equal 
nm 2 


Sz,-x, — O° 


and d.f. = ny “F Ny — 2. 
Variances are not equal: If the population variances are not equal, then the 
standard error is 


SZ 


S%,-%, = Variances not equal 


and d.f. = smaller of n; — 1 andn, — 1. 


SECTION 8.2 Testing the Difference Between Means (Independent Samples, a; and 0 Unknown) 451 


iS 
ROO 
wee 


the World 


A study published by the American 
Psychological Association in the 
journal Neuropsychology reported 
that children with musical training 
showed better verbal memory than 
children with no musical training. 
The study also showed that the 
longer the musical training, the 
better the verbal memory. Suppose 
you tried to duplicate the results 
as follows. A verbal memory test 
with a possible 100 points was 
administered to 90 children. Half 
had musical training, while the 
other half had no training and 
acted as the control group. The 

45 children with training had an 
average score of 83.12 with a 
standard deviation of 5.7, The 

45 students in the control group 
had an average score of 79.9 with 
a standard deviation of 6.2. 


The requirements for the z-test described in Section 8.1 and the ¢test 


described in this section are shown in the flowchart below. 


Two-Sample Tests for Independent Samples 


Are both population % — as 
standard deviations Ye, PoP Ye Use the z-test. 
‘ieoerae or are both sample 
sizes at least 30? 
Are both 
populations normal 


or are both sample 
sizes at least 30? 


‘ ee ee 
Are the population Use the t-test with S¥,-¥,=O° my So om 
variances equal? we 
and d.f.=n, +n,-2. 


You cannot use the 
z-test or the f-test. 


gs? 2 
he ¢-test withs; _;,= /1+2 
Use the am lath, 


and d.f. = smaller of n, — 1 andn,—1. 


GUIDELINES 


Using a Two-Sample ¢-Test for the Difference Between Means 
(Independent Samples, o, and 0, Unknown) 


At a = 0.05, is there enough 
evidence to support the claim that 
children with musical training have 
better verbal memory test scores 
than those without training? 
Assume the population variances 
are equal. 


In Words 


Verify that a, and a are unknown, the 
samples are random and independent, 
and either the populations are normally 
distributed or both n; = 30 and nz = 30. 
State the claim mathematically 

and verbally. Identify the null 

and alternative hypotheses. 


In Symbols 


State Hp and H,. 


Specify the level of significance. Identify a. 


df. = ny + ny — 2 0r 

df. = smaller of n; — 1 
and nz — 1 

Use Table 5 in Appendix B. 


Determine the degrees of freedom. 


Determine the critical value(s). 


Determine the rejection region(s). 

; a ¥1 ~ %2) — (Hi ~ & 
Find the standardized test statistic t= ( 2) = (4 2) 
and sketch the sampling distribution. Sz1—%, 


Make a decision to reject or fail to 
reject the null hypothesis. 


If ¢ is in the rejection region, 
then reject Hp. Otherwise, 
fail to reject Ho. 

Interpret the decision in the context 

of the original claim. 


452 CHAPTER 8 Hypothesis Testing with Two Samples 


Sample Statistics for 
State Mathematics Test Scores 


Teacher 1 Teacher 2 


¥, = 473 | xX = 459 
8, = 39.7 | sy = 245 
ny = 8 na = 18 


Sample Statistics for 
Annual Earnings 


High school _— Associate’s 
diploma degree 


¥, = $36,875  X) = $44,900 
SA $5475 Ss. = $8580 
ny = 25 nz = 16 


TI-84 PLUS 


£7 SOMme TT est 


See Minitab steps 
on page 486. 


A Two-Sample ?-Test for the Difference Between Means 


The results of a state mathematics test for random samples of students taught 
by two different teachers at the same school are shown at the left. Can you 
conclude that there is a difference in the mean mathematics test scores for 
the students of the two teachers? Use a = 0.10. Assume the populations are 
normally distributed and the population variances are not equal. 


SOLUTION 


Note that o, and o2 are unknown, the samples are random and independent, 
and the populations are normally distributed. So, you can use the f-test. The 
claim is “there is a difference in the mean mathematics test scores for the 
students of the two teachers.” So, the null and alternative hypotheses are 


Age, = M2 and Ay: fy # fy, (Claim) 


Because the population variances are not equal and the smaller sample size 
is 8, use d.f. = 8 — 1 =7. The test is a two-tailed test with d.f. = 7 and 
a = 0.10, so the critical values are —f) = —1.895 and tg = 1.895. The rejection 
regions are tf < —1.895 and ¢t > 1.895. The standardized test statistic is 


(%1 — X2) — (M1 — Ba) 


i= 5 5 Use the t-test (variances are not equal). 
SI 52 
aos + a 
my Ng 


(473 — 459) — 0 
= Assume p41 = M2, SO py — Po = 0. 
{o@ . (24.5)? 
8 18 
= 0.922. Round to three decimal places. 


The figure at the left shows the location of the rejection regions and the 
standardized test statistic ¢. Because ¢ is not in the rejection region, you fail to 
reject the null hypothesis. 


Interpretation There is not enough evidence at the 10% level of significance 
to support the claim that the mean mathematics test scores for the students of 
the two teachers are different. 


TRY IT YOURSELF 1 


The annual earnings of 25 people with a high school diploma and 16 people 
with an associate’s degree are shown at the left. Can you conclude that there 
is a difference in the mean annual earnings based on level of education? Use 
a = 0.05. Assume the populations are normally distributed and the population 
variances are not equal. (Adapted from U.S. Census Bureau) 

Answer: Page A37 


You can also use technology and a P-value to perform a hypothesis test for 
the difference between means. For instance, in Example 1, you can enter the data 
in a TI-84 Plus, as shown at the left, and find P ~ 0.379. Because P > a, you 
fail to reject the null hypothesis. Note that when using technology, the number of 
degrees of freedom for the t-test is often determined by the formula 


7 (st/ny + s/n)? 
(st/my)?/ (my, — 1) + (83/n2)?/ (m2 - 1) 


This formula will not be used in the text. 


d.f. 


SECTION 8.2. Testing the Difference Between Means (Independent Samples, a and a Unknown) 453 


Sample Statistics for 


Sedan Driving Costs 
Manufacturer Competitor 
x, = $0.48/mi | x. = $0.51/mi 
5, = $0.05/mi | s. = $0.07/mi 
ny = 30 Ng = 32 


' Tech Tip 


It is important to note 
that when using a 
TI-84 Plus for the 
two-sample ttest, 
select the Pooled: Yes 
input option when the 
variances are equal. 


o SS SS SS SSS See 


23 -2\ 241 
t~-1.930 ‘ty =- 


RP 


671 


See TI-84 Plus 
steps on page 487. 


A Two-Sample ?-Test for the Difference Between Means 


A manufacturer claims that the mean driving cost per mile of its sedans is less 
than that of its leading competitor. You conduct a study using 30 randomly 
selected sedans from the manufacturer and 32 from the leading competitor. The 
results are shown at the left. At a = 0.05, can you support the manufacturer’s 
claim? Assume the population variances are equal. (Adapted from American 
Automobile Association) 


SOLUTION 


Note that o, and o) are unknown, the samples are random and independent, 
and both n, and 7 are at least 30. So, you can use the t-test. The claim is “the 
mean driving cost per mile of the manufacturer’s sedans is less than that of its 


leading competitor.” So, the null and alternative hypotheses are 
A: By = 2 and HA: By < H2- (Claim) 


The population variances are equal, so d.f. = ny + ny — 2 = 30 + 32-2 = 60. 
Because the test is a left-tailed test with d.f. = 60 and a = 0.05, the critical 
value is ft) = —1.671. The rejection region is t < —1.671. To make the 
calculation of the standardized test statistic easier, first find the standard error. 


a — 1)s} + (n, — 1)s4 1 1 
Sc <= i 
meee ny +E Ny — 2 ny ny 


7 4 — 1)(0.05)? + (32 —1)(0.07)2 {1 1 
30 + 32-2 “V 30° 32 
= 0.0155416 
The standardized test statistic is 
t= Cia ©) = a) Use the ¢-test (variances are equal). 
X1—X2 
= EN ea Assume fly = [2,80 fy — by = 0. 
0.0155416 ° 
—1.930. 
The figure at the left shows the location of the rejection region and the 
standardized test statistic tr. Because f is in the rejection region, you reject the 
null hypothesis. 


N 


Round to three decimal places. 


Interpretation There is enough evidence at the 5% level of significance to 
support the manufacturer’s claim that the mean driving cost per mile of its 
sedans is less than that of its competitor’s. 


TRY IT YOURSELF 2 


A manufacturer claims that the mean 
driving cost per mile of its minivans 
is less than that of its leading 


Sample Statistics for 
Minivan Driving Costs 


competitor. You conduct a study using Menetaire CompentOn 
34 randomly selected minivans from the x, = $0.52/mi | x, = $0.54/mi 
manufacturer and 38 from the leading 5, = $0.08/mi | s. = $0.07/mi 
competitor. The results are shown at ny = 34 ny = 38 


the right. At a = 0.10, can you support 
the manufacturer’s claim? Assume the 
population variances are equal. (Adapted 
from American Automobile Association) 
Answer: Page A37 


454 CHAPTER 8 Hypothesis Testing with Two Samples 


8.2 EXERCISES 


Sample Statistics for Annual Costs 
of Routine Veterinarian Visits 


Dogs Cats 


5, = $30 | sy = $27 
ny = 16 nz = 18 


TABLE FOR EXERCISE 13 


For Extra Help: MyLab Statistics 


Building Basic Skills and Vocabulary 


1. What conditions are necessary in order to use the /-test to test the difference 
between two population means? 


2. Explain how to perform a two-sample f-test for the difference between 
two population means. 


In Exercises 3-8, use Table 5 in Appendix B to find the critical value(s) for the 
alternative hypothesis, level of significance a, and sample sizes n, and n>. Assume 
that the samples are random and independent, the populations are normally 
distributed, and the population variances are (a) equal and (b) not equal. 


3. Hat wy ~ po, a = 0.10,n, = 11,n) = 14 


4. Hy: wy, > po, a = 0.01, n, = 12,n. = 15 
5. Ay: by < bo, a = 0.05, 0, = 7,n2 = 11 
6. Hy: by A po, a = 0.01, 2, = 19, ny = 22 
7. Hy: by > p2,a = 0.05,n, = 13,n. = 8 
8. Ay: wy < po, a = 0.10, 1; = 30,n2 = 32 


In Exercises 9-12, test the claim about the difference between two population 
means 1, and 2 at the level of significance a. Assume the samples are random and 
independent, and the populations are normally distributed. 


9. Claim: p, = py; a = 0.01. Assume of = 03 
Sample statistics: x, = 33.7, 5, = 3.5,n, = 12 and 
X> = 35.5, 52 — 22, no = 17 


10. Claim: , < pr; a = 0.10. Assume of = 05 
Sample statistics: ¥; = 0.345, 5; = 0.305, n, = 11 and 
X> = 0.515, s> on 0.215, ny = 9 


11. Claim: uw, = po; a = 0.05. Assume of # 0 
Sample statistics: x; = 2410, 5; = 175,n, = 13 and 
X> = 2305, s> = 52, Nz = 10 


12. Claim: , > pr; a = 0.01. Assume of # 0 
Sample statistics: ¥; = 52,5, = 4.8,n, = 32 and 
X> = 50, So = 1.2. ng = 40 


Using and Interpreting Concepts 


Testing the Difference Between Two Means In Exercises 13-22, 
(a) identify the claim and state Hy and H.,,, (b) find the critical value(s) and identify 
the rejection region(s), (c) find the standardized test statistic t, (d) decide whether 
to reject or fail to reject the null hypothesis, and (e) interpret the decision in the 
context of the original claim. Assume the samples are random and independent, 
and the populations are normally distributed. 


13. Veterinarian Visits A pet association claims that the mean annual costs 
of routine veterinarian visits for dogs and cats are the same. The results 
for samples of the two types of pets are shown at the left. At a = 0.10, can 
you reject the pet association’s claim? Assume the population variances are 
equal. (Adapted from American Pet Products Association) 


SECTION 8.2. Testing the Difference Between Means (Independent Samples, o; and a» Unknown) 455 


Sample Statistics for Amount 


Spent by Customers 
Movies Amusement Parks 

ho $9 i= $7 

ny = 26 ny = 36 


TABLE FOR EXERCISE 14 


FIGURE FOR EXERCISE 16 


14. 


15. 


16. 


17. 


18. 


Entertainment A channel claims that the mean amount of money spent by 
a family at movie theaters is greater than the mean amount spent by a family 
at amusement parks. The results for samples of expenses for the two modes 
of entertainment are shown at the left. At a = 0.01, can you support the 
channel’s claim? Assume the population variances are equal. 


Manufacturing Defects A manager claims that the defects in units produced 
at Site A are more than the defects in units produced at Site B. The sample 
of 23 units from Site A contain a mean of 78 defects and a standard deviation 
of 6 defects. The sample of 19 units from Site B contain a mean of 75 defects 
and a standard deviation of 4 defects. At a = 0.05, can you support the 
manager’s claim? Assume the population variances are equal. 


Yellowfin Tuna A marine biologist claims that the mean fork length 
(see figure at the left) of yellowfin tuna is different in two zones in the 
eastern tropical Pacific Ocean. A sample of 26 yellowfin tuna collected in 
Zone A has a mean fork length of 76.2 centimeters and a standard deviation 
of 16.5 centimeters. A sample of 31 yellowfin tuna collected in Zone B 
has a mean fork length of 80.8 centimeters and a standard deviation of 
23.4 centimeters. At a = 0.01, can you support the marine biologist’s claim? 
Assume the population variances are equal. (Adapted from Fishery Bulletin) 


Annual Income A demographics researcher claims that the mean 
household income in a recent year is greater in Cuyahoga County, Ohio, 
than it is in Wayne County, Michigan. In Cuyahoga County, a sample 
of 19 residents has a mean household income of $45,600 and a standard 
deviation of $2800. In Wayne County, a sample of 15 residents has a 
mean household income of $41,500 and a standard deviation of $1310. At 
a = 0.05, can you support the demographics researcher’s claim? Assume the 
population variances are not equal. (Adapted from U.S. Census Bureau) 


Annual Income A demographics researcher claims that the mean 
household income in a recent year is the same in Ada County, Idaho, and 
Cameron Parish, Louisiana. In Ada County, a sample of 18 residents has a 
mean household income of $58,300 and a standard deviation of $9000. In 
Cameron Parish, a sample of 20 residents has a mean household income of 
$56,600 and a standard deviation of $15,600. At a = 0.10, can you reject the 
demographics researcher’s claim? Assume the population variances are not 
equal. (Adapted from U.S. Census Bureau) 


Be 19. Tensile Strength The tensile strength of a metal is a measure of its 


ability to resist tearing when it is pulled lengthwise. An experimental 
method of treatment produced steel bars with the tensile strengths 
(in newtons per square millimeter) listed below. 


Experimental Method: 
391 383 333 378 368 401 339 376 366 348 


The conventional method produced steel bars with the tensile strengths 
(in newtons per square millimeter) listed below. 


Conventional Method: 
362 382 368 398 381 391 400 
410 396 411 385 385 395 371 


At a = 0.01, can you support the claim that the experimental method 
of treatment makes a difference in the tensile strength of steel bars? 
Assume the population variances are equal. 


456 


CHAPTER 8 Hypothesis Testing with Two Samples 


BG 20. 


& 21. 


B 


Tensile Strength An engineer wants to compare the tensile strengths 
of steel bars that are produced using a conventional method and an 
experimental method. (The tensile strength of a metal is a measure of its 
ability to resist tearing when pulled lengthwise.) To do so, the engineer 
randomly selects steel bars that are manufactured using each method and 
records the tensile strengths (in newtons per square millimeter) listed 
below. 


Experimental Method: 
395 389 421 394 407 411 389 402 422 
416 402 408 400 386 411 405 389 410 


Conventional Method: 
362 352 380 382 413 384 400 
378 419 379 384 388 372 383 


At a = 0.10, can the engineer support the claim that the experimental 
method produces steel with a greater mean tensile strength? Assume 
the population variances are not equal. 


Teaching Methods A new method of teaching reading is being tested 
on third grade students. A group of third grade students is taught using 
the new curriculum. A control group of third grade students is taught 
using the old curriculum. The reading test scores for the two groups are 
shown in the back-to-back stem-and-leaf plot. 


Old Curriculum New Curriculum 
9| 3 
99} 4 | 3 
98843321|]5|24 
76422100/)/6/011477777899 
7/01123349 
8 |24 


Key: 9|4|3 = 49 for old curriculum and 43 for new curriculum 


Ata = 0.10, is there enough evidence to support the claim that the new 
method of teaching reading produces higher reading test scores than the 
old method does? Assume the population variances are equal. 


. Teaching Methods Two teaching methods and their effects on 


science test scores are being reviewed. A group of students is taught 
in traditional lab sessions. A second group of students is taught using 
interactive simulation software. The science test scores for the two 
groups are shown in the back-to-back stem-and-leaf plot. 


Traditional Lab Interactive Simulation Software 
4] 6 
99887663210/}7/;0455778 
98511100] 8 |003478899 
20/}9 4139 


Key: 0|9|1 = 90 for traditional and 91 for interactive 


At a = 0.01, can you support the claim that the mean science test score 
is lower for students taught using the traditional lab method than it is 
for students taught using the interactive simulation software? Assume 
the population variances are equal. 


SECTION 8.2 Testing the Difference Between Means (Independent Samples, o; and 0 Unknown) 457 


Sample Statistics for 
Finishing Times of 10K Race 
Participants 

Males Females 


5, =017h | 5s) =0.04h 
ny = 20 ng = 12 


TABLE FOR EXERCISE 23 


Sample Statistics for 
Driving Distances 


Golfer 1 Golfer 2 


sf: = 6 yd 59° = 12, yd 
ny = 9 ny = BS} 


TABLE FOR EXERCISE 24 


Sample Statistics for Number of 
Days Waiting for an Appointment 
with a Family Doctor 


Miami Seattle 


X, = 28 days X = 26 days 
Ss; = 39.7 days sy = 42.4 days 
n= 20 ny = 7. 


TABLE FOR EXERCISE 25 


Extending Concepts 


Constructing Confidence Intervals for pw, — 2 When the sampling 
distribution for X, — X2 is approximated by a t-distribution and the population 
variances are not equal, you can construct a confidence interval for 4, — (2, as 
shown below. 


Ss] 83 ST 83 
(ey hn) = by in Ma < (X%1 — X2) +t nt aes 
1 2 1 2; 


where d.f. is the smaller of ny — 1 and nz — 1 


In Exercises 23 and 24, construct the indicated confidence interval for fu — po. 
Assume the populations are approximately normal with unequal variances. 


23. 10K Race To compare the mean finishing times of male and female 
participants in a 10K race, you randomly select several finishing times from 
both sexes. The results are shown at the left. Construct an 80% confidence 
interval for the difference in mean finishing times of male and female 
participants in the race. (Adapted from Xact) 


24. Golf To compare the mean driving distances for two golfers, you randomly 
select several drives from each golfer. The results are shown at the left. 
Construct a 90% confidence interval for the difference in mean driving 
distances for the two golfers. 


Constructing Confidence Intervals for ~, — p2 When the sampling 
distribution for X, — X> is approximated by a t-distribution and the populations have 
equal variances, you can construct a confidence interval for 4; — [12, as shown below. 


- —t.6+,/—+—<p—-m< - + t.6+,/—+ — 
(x1 X2) co ; : Bi M2 (x1 X2) co i z 


n, — 1)st + (my — 1)85 
where o =" poet Me pide iy the 


In Exercises 25 and 26, construct the indicated confidence interval for 4 — [. 
Assume the populations are approximately normal with equal variances. 


25. Family Doctor To compare the mean number of days spent waiting to see 
a family doctor for two large cities, you randomly select several people in 
each city who have had an appointment with a family doctor. The results 
are shown at the left. Construct a 90% confidence interval for the difference 
in mean number of days spent waiting to see a family doctor for the two 
cities. (Adapted from Merritt Hawkins) 


26. 10K Race To compare the mean ages of male and female participants in 
a 10K race, you randomly select several ages from both sexes. The results 
are shown below. Construct a 95% confidence interval for the difference in 
mean ages of male and female participants in the race. (Adapted from Xact) 


Sample Statistics for Ages of 
10K Race Participants 


Males Females 


xX, = 41 years X) = 41 years 
Ss; = 13.7 years | s, = 14.4 years 
ny = 20 Ny = 12 


How Protein Affects Weight Gain in Overeaters 


In a study published in the Journal of the American Medical Association, three 
groups of 18- to 35-year-old participants overate for an 8-week period. The groups 
consumed different levels of protein in their diet. The low protein group’s diet was 
5% protein, the normal protein group’s diet was 15% protein, and the high protein 
group’s diet was 25% protein. The study found that the low protein group gained 
considerably less weight than the normal protein group or the high protein group. 

You are a scientist working at a health research firm. The firm wants you to 
replicate the experiment. You conduct a similar experiment over an 8-week period. 
The results of the experiment are shown below. 


Low protein Normal protein High protein 


group group group 
as %=681b | ¥,=135i | %=142b 
ae ane “= 17ib. | + =2516 & =F Ih 
ny, = 12 nN, = 16 nz; = 15 


In Exercises 1-3, perform a two-sample t-test to 4, In which comparisons in Exercises 1-3 did you find 
determine whether the mean weight gains of the two a difference in weight gains? Write a summary of 
indicated studies are different. Assume the populations your findings. 


are normally distributed and the population variances 
are equal. For each exercise, write your conclusions as 
a sentence. Use a = 0.05. 


5. Construct a 95% confidence interval for uw, — po, 
where p is the mean weight gain in the normal 
protein group and p, is the mean weight gain in 


1. Test the weight gains of the low protein group 
against those in the normal protein group. 


2. Test the weight gains of the low protein group 
against those in the high protein group. 


3. Test the weight gains of the normal protein group 
against those in the high protein group. 


458 CHAPTER 8 _ Hypothesis Testing with Two Samples 


the high protein group. Assume the populations 
are normally distributed and the population 
variances are equal. (See Extending Concepts in 
Section 8.2 Exercises.) 


What You Should Learn 


» How to perform a f-test to test 
the mean of the differences for a 
population of paired data 


*$. Study Tip 
Recall from Section 8.1 
that two samples are 
dependent when each 
member of one sample 
corresponds to a member 
of the other sample. 


Study Tip 


You can also calculate the 
standard deviation of the 
differences between paired 
data entries using the 
alternate formula 


83 Testing the Difference Between Means (Dependent Samples) 


459 


SECTION 8.3 Testing the Difference Between Means (Dependent Samples) 


The t-Test for the Difference Between Means 


The t-Test for the Difference Between Means 


In Sections 8.1 and 8.2, you performed two-sample hypothesis tests with 
independent samples using the test statistic ¥, — X_ (the difference between 
the means of the two samples). To perform a two-sample hypothesis test with 
dependent samples, you will use a different technique. You will first find the 
difference d for each data pair. 


d = (data entry in first sample) — (corresponding data entry in second sample ) 


The test statistic is the mean d of these differences 


~ 2d 


==" Mean of the differences between paired 
~ A? data entries in the dependent samples 


These conditions are necessary to conduct the test. 


1. The samples are randomly selected. 

2. The samples are dependent (paired). 

3. The populations are normally distributed or the number n of pairs of data is 
at least 30. 


When these conditions are met, the sampling distribution for d, the mean of the 
differences of the paired data entries in the dependent samples, is approximated by 
a f-distribution with n — 1 degrees of freedom, where n is the number of data pairs. 


—t 
0 Ha 0 


The symbols listed in the table are used for the ¢-test for wy. Although 
formulas are given for the mean and standard deviation of differences, you 
should use technology to calculate these statistics. 


Symbol Description 

n The number of pairs of data 

d The difference between entries in a data pair 

La The hypothesized mean of the differences of paired data in 
the population 

d The mean of the differences between the paired data entries 
in the dependent samples 
= d 
d= Xd 

n 

Sd The standard deviation of the differences between the paired 

data entries in the dependent samples 
X(d- dy’ 

Sq — 


n—-1 


460 CHAPTER 8 Hypothesis Testing with Two Samples 


ON When you use a f-distribution to approximate the sampling distribution for 
“my 5 : . : 
fe KES d, the mean of the differences between paired data entries, you can use a f-test 


eee) Picturing 
the World 


The manufacturer of an appetite 
suppressant claims that when its 
product is taken while following a 
low-fat diet with regular exercise 
for 4 months, the average weight 
loss is 20 pounds. To test this 
claim, you studied 12 randomly 
selected dieters taking an appetite 
suppressant for 4 months. The 
dieters followed a low-fat diet with 
regular exercise for all 4 months. 
The results are shown in the 
table. (Adapted from NetHealth, Inc.) 


to test a claim about the mean of the differences for a population of paired data. 


t-Test for the Difference Between Means 


A t-test can be used to test the difference of two population means when 
these conditions are met. 


1. The samples are random. 


2. The samples are dependent (paired). 


3. The populations are normally distributed or n = 30. 


The test statistic is 


= d 
d= 2d 
n 
and the standardized test statistic is 


Weights (in pounds) 


d a 
of 12 Dieters = Pa 


- sqa/ Vn 
The degrees of freedom are 
df.=n- 1. 


t 


Original Weight after 
weight 4th month 


1 185 168 

2, 194 177 

3 213 196 GUIDELINES 

4 198 180 

5 DAA 229 Using the ¢-Test for the Difference Between Means (Dependent Samples) 

6 162 144 In Words In Symbols 

7 1 197 . Verify that the samples are random 

‘ 973 959 and dependent, and either the 

populations are normally 

9) 178 sae distributed or n = 30. 
oe oe ae . State the claim mathematically State Hp and H,. 
il 181 161 and verbally. Identify the null 
12 209 193 and alternative hypotheses. 


hie = Hab dawayaut- Shudy . Specify the level of significance. Identify a. 


provide enough evidence to 
reject the manufacturer's claim? 
Assume the weights are normally 
distributed. 


. Identify the degrees of freedom. df.=n-1 
. Determine the critical value(s). Use Table 5 in Appendix B. 


. Determine the rejection region(s). 


. Calculate d and sy. d= 


Sd 
n 


S(d — d)? 


n-1 


. Find the standardized test statistic t= 
and sketch the sampling distribution. sa/ Vn 
. Make a decision to reject or fail to If ¢ is in the rejection region, 
reject the null hypothesis. then reject Hp. Otherwise, 
fail to reject Hp. 
. Interpret the decision in the context 
of the original claim. 


Study Tip 


To simplify the calculation 
of t, you can round the 

values of d and s, to four 
decimal places, as shown 
in Examples 1 and 2. 


Before After d da 
24 26 =2 4 
22 25 —3 9 
25 25 0 0 
28 29 —1 1 
35 33 2 4 
32 34 —2 4 
30 35 —5 25 
27 30 —3 9 


Study Tip 


You can also use a P-value 
to perform a hypothesis 
test for the difference 
between means. For 
instance, in Example 1, 
you can enter the data in 
Minitab (as shown on page 486) and 
find P = 0.026. Because P < a, you 
reject the null hypothesis. 


SECTION 8.3 _ Testing the Difference Between Means (Dependent Samples) 461 


See Minitab steps 
on page 486. 


The f-Test for the Difference Between Means 


A shoe manufacturer claims that athletes can increase their vertical jump 
heights using the manufacturer’s training shoes. The vertical jump heights 
of eight randomly selected athletes are measured. After the athletes have 
used the shoes for 8 months, their vertical jump heights are measured again. 
The vertical jump heights (in inches) for each athlete are shown in the table. 
At a = 0.10, is there enough evidence to support the manufacturer’s claim? 
Assume the vertical jump heights are normally distributed. (Adapted from 
Coaches Sports Publishing) 


Athlete 1/2/3/4/5/6]71/8 
eee 24 | 22 | 25 | 28 | 35 | 32 | 30 | 27 
(before using shoes) 
eens 76 | 25 | 25 | 29 | 33 | 34 | 35 | 30 
(after using shoes) 

SOLUTION 


Because the samples are random and dependent, and the populations are 
normally distributed, you can use the f¢-test. The claim is that “athletes can 
increase their vertical jump heights.” In other words, the manufacturer claims 
that an athlete’s vertical jump height before using the shoes will be less than the 
athlete’s vertical jump height after using the shoes. Each difference is given by 
d = (jump height before shoes) — (jump height after shoes). 
The null and alternative hypotheses are 
Ay: wg = 0 and Ay: ka < 0. (Claim) 


Because the test is a left-tailed test, a = 0.10, and df. = 8 — 1 = 7, the 
critical value is tg = —1.415. The rejection region is t < —1.415. Using the 
table at the left, you can calculate d and sy as shown below. Notice that the 
alternate formula is used to calculate the standard deviation. 


- d -14 
d= ae = = 175 
n 8 
xd)? —14)? 
Ya = a] 56 — (-14)* 
Sa= nioj= 8 = 2.1213 
n-1 
The standardized test statistic is ; 
_ d — Md 
sa/ Vn 
_ “Lb=0 
2.1213 /V8 pi 
-3 /-2 \4 0 1 2 3 
= —2.333. 


t=—2.333 t=—1415 


The figure shows the location of the rejection region and the standardized test 
statistic t. Because f is in the rejection region, you reject the null hypothesis. 
Interpretation There is enough evidence at the 10% level of significance to 
support the shoe manufacturer’s claim that athletes can increase their vertical 
jump heights using the manufacturer’s training shoes. 


462 


Athlete 


40-yard dash time 


(before using shoes) 


40-yard dash time 


(after using shoes) 


Tech Tip 


One way to use 
technology to perform 
a hypothesis test for 
the difference between 
means is to enter the 
data in two columns 
and form a third column in which 
you calculate the difference for 
each pair. You can now perform a 
one-sample t-test on the difference 
column, as shown in Chapter 7 


- 


a 


CHAPTER 8 Hypothesis Testing with Two Samples 


TRY IT YOURSELF 1 


A shoe manufacturer claims that athletes can decrease their times in the 
40-yard dash using the manufacturer’s training shoes. The 40-yard dash times 
of 12 randomly selected athletes are measured. After the athletes have used 
the shoes for 8 months, their 40-yard dash times are measured again. The 
times (in seconds) are listed in the table below. At a = 0.05, is there enough 
evidence to support the manufacturer’s claim? Assume the times are normally 
distributed. (Adapted from Coaches Sports Publishing) 


1 2 3 4 5 6 7 8 9 10 11 12 
485 490 5.08 4.72 4.62 454 5.25 5.18 481 457 | 4.63 4.77 
478 490 5.05 4.65 4.64 450 5.24 5.27 475 443 4.61 4.82 
| Answer: Page A37 


Note in Example 1 that it is possible the vertical jump height improved because 
of other reasons. Many advertisements misuse statistical results by implying a 
cause-and-effect relationship that has not been substantiated by testing. 


The f-Test for the Difference Between Means 


The campaign staff for a state legislator wants to determine whether the 
legislator’s performance rating (0-100) has changed from last year to this 
year. The table below shows the legislator’s performance ratings from the 
same 16 randomly selected voters for last year and this year. At a = 0.01, is 
there enough evidence to conclude that the legislator’s performance rating has 
changed? Assume the performance ratings are normally distributed. 


Voter 1 2 | 3 4 5 67 8 
Rating (last year) 60 54 78 84 91 25 | 50 | 65 
Rating (this year) 56 | 48 | 70 60 85 40 40°) 55 
Voter 9 10 11) 12 13) 14) 15 | 16 
Rating (last year) 68 | 81 | 75 45 62 79 | 58 63 
Rating (this year) 80 75 78 | 50 50. 85 | 53 | 60 


SOLUTION 


Because the samples are random and dependent, and the populations 
are normally distributed, you can use the f-test. If there is a change in the 
legislator’s rating, then there will be a difference between last year’s ratings 
and this year’s ratings. Because the legislator wants to determine whether 
there is a difference, the null and alternative hypotheses are 


A: ba = 9 and Ay: ba # 0. (Claim) 
Because the test is a two-tailed test, a = 0.01, and df. = 16 — 1 = 15, the 
critical values are —ty = —2.947 and ft) = 2.947. The rejection regions are 


t < —2.947 and t > 2.947. 


SECTION 8.3 _ Testing the Difference Between Means (Dependent Samples) 463 


Using the table at the left, you can calculate d and sq as shown below. 


Before After d d? 
my |< 38 : 16 G2 79 2 45195 
54 48 6 36 n 16 
78 70 8 64 Se — ear) 
84 60 24 576 = n 
91 85 6 36 eer 
25 40 =15 225 532 
50 40 10 100 _, {981-56 
65 55 10 100 = 
68 80 =12 144 ~ 9.6797 
81 75 6 36 
75 78 3 9 The standardized test statistic is 
2 = 2. L= d— ba Use the /-test. 
62 50 12 144 sq/ Vn 
ore 85 =6 36 3.3125 — 0 
58 53 5 25 _ 9.6797 / V6 Assume pig = 0. 
ee : : ~ 1.369. 


+ =53) y = 1581 
You can check this result using technology, as shown below using StatCrunch. 


STATCRUNCH 


Paired T hypothesis test: 

U, = UW, — Hp : Mean of the difference between Last year and This year 
rhs By = @ 

i, § Uy 2 © 


Hypothesis test results: 


1- a=0.99 Difference Mean Std. Err. DF P-value 
Last year—This year 3.8125 2.4199152 15 \1.8688496/ 0.1912 


The figure at the left shows the location of the rejection region and the 
B 4 standardized test statistic ¢. Because ¢ is not in the rejection region, you fail to 
69 ty = 2.947 reject the null hypothesis. 


t—t t 
=A 8) 2 <1 1 
SAT pie: 


Interpretation There is not enough evidence at the 1% level of significance 
to conclude that the legislator’s performance rating has changed. 


TRY IT YOURSELF 2 


A medical researcher wants to determine whether a drug changes the 
body’s temperature. Seven test subjects are randomly selected, and the body 
temperature (in degrees Fahrenheit) of each is measured. The subjects are 
then given the drug and, after 20 minutes, the body temperature of each is 
measured again. The results are listed below. At a = 0.05, is there enough 
evidence to conclude that the drug changes the body’s temperature? Assume 
the body temperatures are normally distributed. 


Subject 1 2 3 4 5 6 7 
Initial temperature 101.8 985 98.1 99.4 | 98.9 | 100.2 | 97.9 
Second temperature 99.2 98.4 98.2 99.0 | 98.6 99.7 97.8 


Answer: Page A37 


464 CHAPTER 8 Hypothesis Testing with Two Samples 


B.3 EXERCISES remot Ss 


Building Basic Skills and Vocabulary 


1. What conditions are necessary in order to use the dependent samples /-test 
for the mean of the differences for a population of paired data? 


2. Explain what the symbols d and s, represent. 


In Exercises 3—8, test the claim about the mean of the differences for a population 
of paired data at the level of significance a. Assume the samples are random and 
dependent, and the populations are normally distributed. 


3. Claim: ug < 0;a = 0.05. Sample statistics: d = 1.5, sq = 3.2,n = 14 
. Claim: wg = 0;a@ = 0.01. Sample statistics: d = 3.2, sg = 8.45,n = 8 
= 6.5, sy = 9.54,n = 16 


a] 


. Claim: wg = 0; a = 0.10. Sample statistics: 


4 
5 

6. Claim: ug > 0;a = 0.05. Sample statistics: d = 0.55, sy = 0.99, n = 28 
7. Claim: wg = 0;a = 0.01. Sample statistics: d = —2.3, sy = 1.2,n = 15 
8 


. Claim: wg ~ 0;a = 0.10. Sample statistics: d = —1, sg = 2.75,n = 20 


Using and Interpreting Concepts 


Testing the Difference Between Two Means In Exercises 9-20, 
(a) identify the claim and state Hj and H,, (b) find the critical value(s) and 
identify the rejection region(s), (c) calculate d and sq, (d) find the standardized 
test statistic t, (e) decide whether to reject or fail to reject the null hypothesis, and 
(f) interpret the decision in the context of the original claim. Assume the samples 
are random and dependent, and the populations are normally distributed. 


9. Dow Jones Stocks A stock market analyst claims that seven of the stocks 
that make up the Dow Jones Industrial Average lost value from one hour 
to the next on one business day. The table shows the prices (in dollars per 
share) of the seven stocks at one time during the day and then an hour 
later. At a = 0.01, is there enough evidence to support the analyst’s claim? 
(Source: MarketWatch) 


Stock 1 2 3 4 5 6 7 
Price (first hour) 183.72 | 31.46 64.26 6811 174.53 35.57 | 92.85 
Price (second hour) | 182.85 | 31.62 | 64.20 | 68.20 | 174.18 | 35.65 | 93.19 
eB 10. SAT Scores An instructor for a SAT preparation course claims that 
the course will improve the test scores of students. The table shows the 
critical reading scores for 10 students the first two times they took the 
SAT. Before taking the SAT for the second time, the students took the 
instructor’s course to try to improve their critical reading SAT scores. 
Ata = 0.01, is there enough evidence to support the instructor’s claim? 
Student 1 2 3 4 5 6 7 8 9 10 
Score (first) 300 | 450 350 | 430 «300 | 470 420 | 370 320 410 
Score (second) 400 520 400 | 490 340 | 580 450 | 400 | 390 | 450 


SECTION 8.3 Testing the Difference Between Means (Dependent Samples) 465 


11. Caffeine Ingestion A researcher claims that caffeine ingestion improves 
repeated freestyle sprints in trained male swimmers. The table shows the 
mean performance times (in seconds) for a group of trained male swimmers 
who complete six 75-meter maximal freestyle sprints after ingesting a 
placebo and after ingesting caffeine. At a = 0.01, is there enough evidence 
to support the researcher’s claim? (Source: Journal of Sports Science & Medicine) 


Sprint number 1 2 3 + 5 6 


Sprint time 


(eatpltecia) 40.2 40.3 40.7. 41.0 40.7 | 40.6 


Sprint time 


(eithtenticine) 39.9 39.9 39.7 40.1 40.2 | 40.5 


eB 12. Batting Averages A coach claims that a baseball clinic will help 
players raise their batting averages. The table shows the batting 
averages of 14 players before participating in the clinic and two months 
after participating in the clinic. At a = 0.05, is there enough evidence 
to support the coach’s claim? 


Player 1 2 3 4 5 6 7 
Batting average 


(before clinic) 


Batting average 
(after clinic) 


Player 8 9 10 11 12 13 14 


Batting average 


ate 0.350 | 0.380 | 0.316 0.270 | 0.300 | 0.330 0.340 
(before clinic) 


Batting average 


ts 0.345 | 0.380 0.315 0.280 0.282 | 0.336 | 0.325 
(after clinic) 


ad} 13. Headaches A physical therapist suggests that soft tissue massage 
therapy helps to reduce the numbers of days per week patients suffer 
from headaches. The table shows the numbers of days per week 
18 patients suffered from headaches before and after 6 weeks of 
receiving massage therapy. At a = 0.01, is there enough evidence to 
support the therapist’s claim? (Adapted from Annals of Musculoskeletal 
Medicine) 


Patient 1 2 3 4 5 6 ih 8 9 
Days (before) 4 5 6 D) 5 6 5 4 4 
Days (after) 5 3 3 4 3 4 3 5 2 


Patient 10 11 > 12 >= «13+ «14: «15 «16 | «17 18 
Days (before) 


Rin 
el un 
w | w 
NOE 
NA 
Wi Nm 
Rl An 
RLU 
win 


Days (after) 


466 


CHAPTER 8 _ Hypothesis Testing with Two Samples 


BG 14. Therapeutic Taping A physical therapist claims that the use of a 
specific type of therapeutic tape reduces pain in patients with chronic 
tennis elbow. The table shows the pain levels on a scale of 0 to 10, 
where 0 is no pain and 10 is the worst pain possible, for 15 patients with 
chronic tennis elbow when holding a 1 kilogram weight. At a = 0.05, is 
there enough evidence to support the therapist’s claim? (Adapted from 


BioMed Central, Ltd.) 


Patient 1 2 
Pain level 4 
(before taping) 

Pain level 5 
(after taping) 

Patient 9 10 
Pain level 1 
(before taping) 

Pain level 0 
(after taping) 


11 


12 


13 


14 


15 


BG 15. Student Housing A college administrator suggests that student 
housing rates have increased from one academic year to the 
next. The table shows the rates (in dollars per academic year) for 
12 student housing arrangement options in two consecutive academic 
years. At a = 0.05, is there enough evidence to support the college 
administrator’s claim? (Source: The University of Kansas) 


Option 1 
Rate (first year) 4488 
Rate (second year) | 4616 


Option 7 
Rate (first year) 7288 


Rate (second year) —§ 7518 


9230 
9516 


5738 
5910 


4 
6064 
6246 


10 
5738 
5910 


5 
6150 
6246 


11 
6064 
6246 


12 
7288 
8454 


16. PhD Stipends An education researcher claims that stipends for PhD 
students have increased from one academic year to the next. The table 
shows the PhD stipends (in dollars per academic year) for seven different 
fields of study at various institutions in two consecutive academic years. At 
a = 0.05, is there enough evidence to support the education researcher’s 


claim? (Source: PhDStipends.com) 


Field of Study 1 2 
Stipend 
(first year) 30,600 | 29,658 


Stipend 


(eccnndivenr) 32,850 | 30,770 


15,200 


19,800 


5 
26,233 | 11,900 
27,100 | 11,000 


33,000 


33,000 


33,590 


30,312 


SECTION 8.3 Testing the Difference Between Means (Dependent Samples) 467 


17. 


Product Ratings A company claims that its consumer product 
ratings (0-10) have changed from last year to this year. The table shows 
the company’s product ratings from the same eight consumers for last 
year and this year. At a = 0.05, is there enough evidence to support the 
company’s claim? 


Consumer 1/2)3;)4;)5 /)/6/7)7) 8 
Rating (last year) 5 7 2 3 9 10) 8 7 
Rating (this year) | 5 9/4/)/6;)9;)9}|)9}] 8 


eB 18. Pass Completion Percentages The pass completion percentages of 


19. 


20. 


10 college football quarterbacks for their freshman and sophomore 
seasons are shown in the table below. At a = 0.10, is there enough 
evidence to support the claim that the pass completion percentages 
have changed? (Source: Sports Reference, LLC) 


Player 1 2 3 4 5 6 7 8 9 10 


Pass completion 


percentage 67.9 61.5 | 56.8 60.0 | 63.6 50.0 57.0 63.1 | 54.7 58.5 
(freshman) 


Pass completion 


percentage 67.8 | 56.5 | 63.5 | 60.7 | 61.9 | 57.9 | 62.3 | 62.1 | 56.2 | 61.2 
(sophomore) 


Fiber Content A cookie manufacturer claims that its new cookies have 
more fiber than the ones produced by its competitor. The table shows the 
fiber content in seven samples of each type of cookies produced by the 
manufacturer and his competitor. At a = 0.01, is there enough evidence to 
support the cookie manufacturer’s claim? 


Sample 1 2 3 4 >) 6 7 


Fiber in manufacturer’s 
sample 


Fiber in competitor’s 
sample 


Derby Ina derby, eight jockeys are riding a few exceptionally well-trained 
horses. As part of the race, the horses are timed as they run an obstacle 
course. The table shows the times (in seconds) of the horses in the first 
lap and the final lap of the race. At a = 0.05, is there enough evidence to 
support the claim that the horses’ running times have changed? 


Horse 1 2 3 4 5 6 7 8 
ine 125.6 | 136.4 | 129.8 | 135.4 | 121.0 | 136.7 | 175.4 | 146.8 
(first lap) 

Time 


128.7 | 135.5 | 139.1 | 124.7 | 139.8 129.9 | 161.6 | 151.7 
(final lap) 


468 


CHAPTER 8 Hypothesis Testing with Two Samples 


Extending Concepts 


21. In Exercise 15, use technology to perform the hypothesis test with a P-value. 
Compare your result with the result obtained using rejection regions. Are 
they the same? 


22. In Exercise 18, use technology to perform the hypothesis test with a P-value. 
Compare your result with the result obtained using rejection regions. Are 
they the same? 


Constructing Confidence Intervals for mw, To construct a confidence 
interval for [1g, use the inequality below. 


ZF Sd = Sd 
Cte hh ae 


In Exercises 23 and 24, construct the indicated confidence interval for wy. Assume 
the populations are normally distributed. 


eB 23. Drug Testing A sleep disorder specialist wants to test the effectiveness 
of a new drug that is reported to increase the number of hours of sleep 
patients get during the night. To do so, the specialist randomly selects 
16 patients and records the number of hours of sleep each gets with 
and without the new drug. The table shows the results of the two-night 
study. Construct a 90% confidence interval for pg. 


Patient 1 2 3 4 5 6 7 8 


Hours of sleep 


(erie Gechee) 1.8 | 2.0 | 3.4 | 3.5 | 3.7 | 3.8 | 3.9 | 3.9 


Moursotslecn 30136|40) 44/45 | 52/55 | 57 
(using the drug) 


Patient 9 10 11) 12) 13 «14~—~«IS | 16 


Hours of sleep 


ekihon edie 40 49) 51 52 > 50 45 |) 42) 47 


Hours of sleep 

eee ane cae) 62 63 66) 78 > 72 65 ) 5.6 | 5.9 

7% 24. Herbal Medicine Testing A sleep disorder specialist wants to test 
whether herbal medicine increases the number of hours of sleep 
patients get during the night. To do so, the specialist randomly selects 
14 patients and records the number of hours of sleep each gets with 
and without the new drug. The table shows the results of the two-night 
study. Construct a 95% confidence interval for py. 


R 


Patient 1 2 3 4 5 6 7 


Hours of sleep 


(vienna dicine) 1.0 | 14 | 34 | 3.7) 5.1 | 5.1 | 5.2 


Hours of sleep 
(using medicine) 


Patient 8 9 10 11 12) 13 14 


Hours of sleep 


Goth nit micdicine) 53 55 5.8 |) 42 48 2.9 | 45 


Hours of sleep 
(using medicine) 


What You Should Learn 


» How to perform a two-sample 
z-test for the difference between 
two population proportions p, 
and p> 


Study Tip 


You can also write the null 
and alternative hypotheses 
as shown below. 


ees 
Hg: Pi — Pz #0 


co . 
Hg: P; — Pz > 0 
meres 
Hg: P1 — P2 < 0 


Study Tip 


The symbols in the table 
below are used in the 
z-test for Pp; — Po. See 
Sections 4.2 and 5.5 to 
review the binomial 


= ~~ distribution. 

Symbol Description 

P1; P2 Population 
proportions 

X14, Xz Number of 
successes in each 
sample 

ny, Nz Size of each sample 

Pi, Pr Sample proportions 
of successes 

Dp Weighted estimate 
of p; and p, 

qd Weighted estimate 


of gq, and qp, 
q—i-7 


8d Testing the Difference Between Proportions 


469 


SECTION 8.4 _ Testing the Difference Between Proportions 


Two-Sample z-Test for the Difference Between Proportions 


Two-Sample z-Test for the Difference Between 
Proportions 


In this section, you will learn how to use a z-test to test the difference between 
two population proportions p, and pz using a sample proportion from each 
population. If a claim is about two population parameters p; and p>, then some 
possible pairs of null and alternative hypotheses are 


ve Pi = P2 _ Pi = P2 eu ee Pi = P2 

Hy: py # py Ay: Py > Pr Ay: Py < Pr 

Regardless of which hypotheses you use, you always assume there is no difference 
between the population proportions (p; = p2). 

For instance, suppose you want to determine whether the proportion of 
female college students who earn a bachelor’s degree in four years is different 
from the proportion of male college students who earn a bachelor’s degree 
in four years. These conditions are necessary to use a z-test to test such 
a difference. 


1. The samples are randomly selected. 
2. The samples are independent. 
3. The samples are large enough to use a normal sampling distribution. That is, 
mp, = 5,n1q, = 5, npr = 5, and noqp = 5. 
When these conditions are met, the sampling distribution for p, — p,, the 
difference between the sample proportions, is a normal distribution with mean 
MK p,- p. — Pi ~ P2 


and standard error 


— | Pid 4 P2de 
Tp- po ny Ny : 


Notice that you need to know the population proportions to calculate the 
standard error. Because a hypothesis test for p; — pz is based on the assumption 
that p,; = p>, you can calculate a weighted estimate of p, and p> using 


—_ 4 + xX 
. ny + No 

where x; = 1,p, and x2 = nyp>. With the weighted estimate p, the standard 

error of the sampling distribution for 6, — pp is 


On —— nd! A vd 
Pi- P2 Pq ny N> 


where g = 1 — p. 

Also, you need to know the population proportions to verify that the 
samples are large enough to be approximated by the normal distribution. But 
when determining whether the z-test can be used for the difference between 
proportions for a binomial experiment, you should use p in place of p, and p, 
and use g in place of q; and q. 


470 CHAPTER 8 Hypothesis Testing with Two Samples 


oN 
$ 
aarar " 


eee) Pieturing 
the World 


A medical research team 
conducted a study to test whether 
a drug lowers the chance of 
getting diabetes. In the study, 
2623 people took the drug and 
2646 people took a placebo. The 
results are shown below. (Source: 
The New England Journal of Medicine) 


When the sampling distribution for p,; — fp is normal, you can use a 
two-sample z-test to test the difference between two population proportions p; 
and p>. 


Two-Sample z-Test for the Difference Between Proportions 


A two-sample z-test is used to test the difference between two population 
proportions p, and p, when these conditions are met. 


1. The samples are random. 


2. The samples are independent. 


3. The quantities np, 11g, nop, and nog are at least 5. 


The test statistic is 5, — p.. The standardized test statistic is 
Got 


Diabetes (Pi — Po) — (Pi — Po) 


Drug Placebo . . 
If the null hypothesis states pj = po, py = pr, OF py = pr, then p, = po is 
At a = 0.05, can you support the assumed and the expression p, — p> is equal to 0 in the preceding test. 
claim that the drug lowers the 


chance of getting diabetes? 


GUIDELINES 


Using a Two-Sample z-Test for the Difference Between Proportions 
In Words In Symbols 


. Verify that the samples are random 
and independent. 


. Find the weighted estimate of p, 
and p>. Verify that np, 14g, nop, 
and nog are at least 5. 
. State the claim mathematically State Hp and H,. 
and verbally. Identify the null 
and alternative hypotheses. 


. Specify the level of significance. Identify a. 


* Study Tip 
To simplify the calculation 
of z, you can round the 
values of p, G, 61, and 6» 
to four decimal places, as 
shown in Examples 1 and 2. 


. Determine the critical value(s). Use Table 4 in Appendix B. 


. Determine the rejection region(s). 


(Pi — Pr) — (Pi — P2) 


. Find the standardized test statistic 
and sketch the sampling distribution. 


. Make a decision to reject or fail to If z is in the rejection region, 
reject the null hypothesis. then reject Hp. Otherwise, 
fail to reject Ho. 
. Interpret the decision in the context 
of the original claim. 


A hypothesis test for the difference between proportions can also be 
performed using P-values. Use the guidelines above, skipping Steps 5 and 6. After 
finding the standardized test statistic, use Table 4 in Appendix B to calculate the 
P-value. Then make a decision to reject or fail to reject the null hypothesis. If P 
is less than or equal to a, then reject Hy. Otherwise, fail to reject Ho. 


Study Tip 


To find x, and Xo, 
use xX, = 1,6, and 
X2 = MP2. 


Sample Statistics for Vehicles 


Passenger cars Pickup trucks 


n, = 200 ny = 250 
Pp, = 0.910 py = 0.832 
x, = 182 X2 = 208 


SECTION 8.4 _ Testing the Difference Between Proportions 471 


See TI-84 Plus 
steps on page 487. 
A Two-Sample z-Test for the Difference Between Proportions 


A study of 200 randomly selected occupants in passenger cars and 
250 randomly selected occupants in pickup trucks shows that 91.0% of 
occupants in passenger cars and 83.2% of occupants in pickup trucks wear seat 
belts. At a = 0.10, can you reject the claim that the proportion of occupants 
who wear seat belts is the same for passenger cars and pickup trucks? (Adapted 
from National Highway Traffic Safety Administration) 


SOLUTION 
The samples are random and independent. Also, the weighted estimate of 
Pp, and p is 

X, +x, — 182+ 208 — 390 


mtn, 200+ 250 450 


= 0.8667 


and the value of @ is 
q=1-p = 1 — 0.8667 = 0.1333. 


Because nyp ~ 200(0.8667), 2yq ~ 200(0.1333), np ~ 250(0.8667), and 
nog ~ 250(0.1333) are at least 5, you can use a two-sample z-test. The claim 
is “the proportion of occupants who wear seat belts is the same for passenger 
cars and pickup trucks.” So, the null and alternative hypotheses are 

Ao: Pi = Ps (Claim) and Hi: P1 F P2- 


Because the test is two-tailed and the level of significance is a = 0.10, the 


critical values are —z) = —1.645 and zy = 1.645. The rejection regions are 
z < —1.645 and z > 1.645. The standardized test statistic is 
DB, — po) — — 0.910 — 0.832) — 0 
_ (Pi ~ bo) — (i= Pr) ( ) gay 


 § Ge 1 1 ) 
Sl 0.8667 )(0.1333){ —— + —— 
: a x) Vi (SI 250 


The figure below shows the location of the rejection regions and the 
standardized test statistic z. Because z is in the rejection region, you reject the 
null hypothesis. 


1- a=0.90 


3 2\1 0 1/2\3 
—Zy=—1645 Z%=1645 7 =2.42 


Interpretation There is enough evidence at the 10% level of significance to 
reject the claim that the proportion of occupants who wear seat belts is the 
same for passenger cars and pickup trucks. 


TRY IT YOURSELF 1 


Consider the results of the study discussed on page 439. At a = 0.05, can you 
support the claim that there is a difference between the proportion of yoga 
users who are 40- to 49-year-olds and the proportion of non-yoga users who 
are 40- to 49-year-olds? 

Answer: Page A37 


472 


Study Tip 


To find 6; and pz use 


xy dp x 
= —an — 
Py n; P2 in 


Sample Statistics for 
Cholesterol-Reducing 


Medication 
Received Received 
medication placebo 
n, = 4700 ny = 4300 
xX = 301 XQ = 357 


py ~ 0.0640 py ~ 0.0830 


CHAPTER 8_ Hypothesis Testing with Two Samples 


A Two-Sample z-Test for the Difference Between Proportions 


A medical research team conducted a study to test the effect of a cholesterol- 
reducing medication. At the end of the study, the researchers found that of the 
4700 randomly selected subjects who took the medication, 301 died of heart 
disease. Of the 4300 randomly selected subjects who took a placebo, 357 died 
of heart disease. At a = 0.01, can you support the claim that the death rate due 
to heart disease is lower for those who took the medication than for those who 
took the placebo? (Adapted from The New England Journal of Medicine) 


SOLUTION 


The samples are random and independent. Also, the weighted estimate of 
Pp, and p> is 
Xy + Xz 301 + 357 658 


n= = = ~ 0.0731 
Pe nyt mm 4700 + 4300 9000 


and the value of @ is 
q=1-p =~ 1-— 0.0731 = 0.9269. 


Because n;p = 4700(0.0731), 11g = 4700(0.9269), nop = 4300(0.0731), and 
nog = 4300(0.9269) are at least 5, you can use a two-sample z-test. The 
claim is “the death rate due to heart disease is lower for those who took the 
medication than for those who took the placebo.” So, the null and alternative 
hypotheses are 


Hop, =p. and 4H, p, < p>. (Claim) 


Because the test is left-tailed and the level of significance is a = 0.01, 
the critical value is zy = —2.33. The rejection region is z < —2.33. The 
standardized test statistic is 


(P1 — Po) — (Pi — Po) (0.0640 — 0.0830) — 0 


Z= = = —3.46. 


Cot de pul ‘| ( 1 1 ) 

Fale 0.0731 ) (0.9269 )| —— + —— 
Pa; =) ( M( ) 4700 4300 
The figure below shows the location of the rejection region and the 
standardized test statistic z. Because z is in the rejection region, you reject the 
null hypothesis. 


zZ=—3.46 


Interpretation There is enough evidence at the 1% level of significance to 
support the claim that the death rate due to heart disease is lower for those 
who took the medication than for those who took the placebo. 


TRY IT YOURSELF 2 


Consider the results of the study discussed on page 439. At a = 0.05, can you 
support the claim that the proportion of yoga users with incomes of $20,000 to 
$34,999 is less than the proportion of non-yoga users with incomes of $20,000 
to $34,999? 

Answer: Page A37 


B.4 EXERCISES 


How Many Subjects Had 
12-Week Confirmed Disability 
Progression and 
How Many Did Not? 
Disability 
progression 


No disability 
progression 


No disability 
progression 
148 

Drug Placebo 
FIGURE FOR EXERCISE 7 


How Many Subjects Survived 
After 12 Years 
and How Many Did Not? 


Did not 


Es / 


Survived 
Chemotherapy Placebo 


268 
FIGURE FOR EXERCISE 8 


SECTION 8.4 _ Testing the Difference Between Proportions 473 


For Extra Help: MyLab Statistics 


Building Basic Skills and Vocabulary 


1. What conditions are necessary in order to use the z-test to test the difference 
between two population proportions? 


2. Explain how to perform a two-sample z-test for the difference between 
two population proportions. 


In Exercises 3-6, determine whether a normal sampling distribution can be 
used. If it can be used, test the claim about the difference between two population 
proportions p, and py, at the level of significance a. Assume the samples are 
random and independent. 


3. Claim: p, # p2;a = 0.01 
Sample statistics: x, = 35,n, = 70 and x. = 36, nz = 60 


4. Claim: p; < p23 a = 0.05 
Sample statistics: x; = 471,n, = 785 and x2 = 372,n2 = 465 


5. Claim: p, = pr; a = 0.10 
Sample statistics: x, = 42,n, = 150 and x, = 76, ny = 200 


6. Claim: p; > pr; a = 0.01 
Sample statistics: x; = 6,n, = 20 and x. = 4,2 = 30 


Using and Interpreting Concepts 


Testing the Difference Between Two Proportions In Exercises 7-12, 
(a) identify the claim and state Hy and H.,,, (b) find the critical value(s) and identify 
the rejection region(s), (c) find the standardized test statistic z, (d) decide whether 
to reject or fail to reject the null hypothesis, and (e) interpret the decision in the 
context of the original claim. Assume the samples are random and independent. 


7. Multiple Sclerosis Drug In a study to determine the effectiveness of 
using a drug to treat multiple sclerosis, 488 subjects were given the drug 
and 244 subjects were given a placebo. The numbers of subjects who had 
12-week confirmed disability progression were tracked. The results are 
shown at the left. At a = 0.01, can you support the claim that there is a 
difference in the proportion of subjects who had no 12-week confirmed 
disability progression? (Adapted from The New England Journal of Medicine) 


8. Cancer Drug Ina study, 760 men with recurrent prostate cancer underwent 
radiation with or without a type of hormone-based chemotherapy. For 
24 months, 384 subjects were given the chemotherapy and 376 subjects 
were given a placebo. The numbers who survived and did not survive after 
12 years were tracked. The results are shown at the left. At a = 0.10, can 
you support the claim that the proportion of 12-year survivors is greater 
for subjects who were given the chemotherapy than for subjects who were 
given the placebo? (Adapted from The New England Journal of Medicine) 


9. Young Adults In asurvey of 1750 females ages 20 to 24 whose highest level 
of education is completing high school, 64.4% were employed. In a survey of 
2000 males ages 20 to 24 whose highest level of education is completing high 
school, 73.2% were employed. At a = 0.01, can you support the claim that 
there is a difference in the proportion of those employed between the two 
groups? (Adapted from National Center for Education Statistics) 


474 


CHAPTER 8 _ Hypothesis Testing with Two Samples 


10. 


11. 


Young Adults Ina survey of 500 males ages 20 to 24, 15.8% were neither 
in school nor working. In a survey of 500 females ages 20 to 24, 17.8% were 
neither in school nor working. At a = 0.05, can you support the claim that 
the proportion of males ages 20 to 24 who were neither in school nor working 
is less than the proportion of females ages 20 to 24 who were neither in 
school nor working? (Adapted from National Center for Education Statistics) 


Seat Belt Use In a survey of 1000 drivers from the West, 934 wear a seat 
belt. In a survey of 1000 drivers from the Northeast, 909 wear a seat belt. At 
a = 0.05, can you support the claim that the proportion of drivers who wear 
seat belts is greater in the West than in the Northeast? (Adapted from National 
Highway Traffic Safety Administration) 


. Seat Belt Use In a survey of 1000 drivers from the Midwest, 855 wear a 


seat belt. In a survey of 1000 drivers from the South, 909 wear a seat belt. At 
a = 0.10, can you support the claim that the proportion of drivers who wear 
seat belts in the Midwest is less than the proportion of drivers who wear seat 
belts in the South? (Adapted from National Highway Traffic Safety Administration) 


Intermarriages = /n Exercises 13-18, use the figure, which shows the percentages 
of newlyweds in the United States who have a spouse of a different race or 
ethnicity. The survey included random samples of 1000 Asian newlyweds, 
1000 Hispanic newlyweds, 1000 black newlyweds, and 1000 white newlyweds. 
(Adapted from Pew Research Center) 


13. 


14. 


15. 


16. 


17. 


18. 


Asians and Hispanics At a = 0.05, Intermarriages 

can you reject the claim that the Percdiianarohinewivwedeune 
proportion of newlywed Asians_ | havea spouce of a different 
who have a spouse of a different | ace or ethnicity 

race or ethnicity is the same as the Asians 

proportion of newlywed Hispanics 
who have a spouse of a different race 
or ethnicity? 


Blacks and Asians At a = 0.01, 
can you support the claim that the 
proportion of newlywed blacks who 
have a spouse of a different race or 
ethnicity is less than the proportion of 
newlywed Asians who have a spouse 
of a different race or ethnicity? 


Hispanics 


Blacks 
Whites 11% 


Asians and Whites At a = 0.01, can you support the claim that the 
proportion of newlywed Asians who have a spouse of a different race or 
ethnicity is greater than the proportion of newlywed whites who have a 
spouse of a different race or ethnicity? 


Hispanics and Blacks At a = 0.05, can you support the claim that the 
proportion of newlywed Hispanics who have a spouse of a different race or 
ethnicity is different from the proportion of newlywed blacks who have a 
spouse of a different race or ethnicity? 


Whites and Blacks At a = 0.01, can you support the claim that the 
proportion of newlywed whites who have a spouse of a different race or 
ethnicity is less than the proportion of newlywed blacks who have a spouse 
of a different race or ethnicity? 


Hispanics and Whites At a = 0.05, can you support the claim that the 
proportion of newlywed Hispanics who have a spouse of a different race 
or ethnicity is greater than the proportion of newlywed whites who have a 
spouse of a different race or ethnicity? 


USS. Workforce 


Percentage of full-time employed 
men and women in the U.S. 


Men Women 


Work 40 hours Work more than 
per week 40 hours per week 


FIGURE FOR EXERCISES 19-22 


SECTION 8.4 _ Testing the Difference Between Proportions 475 


U.S. Workforce Jn Exercises 19-22, use the figure shown at the left, which 
gives the percentages of full-time employed men and women in the United States 
who work 40 hours per week and who work more than 40 hours per week. Assume 
the survey included random samples of 300 men and 250 women. (Adapted from 
Gallup) 


19. Men: Numbers of Hours Worked Per Week Ata = 0.01, can you reject the 
claim that the proportion of men who work 40 hours per week is the same as 
the proportion of men who work more than 40 hours per week? 


20. Women: Numbers of Hours Worked Per Week At a = 0.05, can you 
support the claim that the proportion of women who work 40 hours per week 
is greater than the proportion of women who work more than 40 hours per 
week? 


21. Working 40 Hours Per Week: Men and Women At a = 0.05, can you 
support the claim that the proportion of the U.S. workforce that works 
40 hours per week is greater for women than for men? 


22. Working More Than 40 Hours Per Week: Men and Women At a = 0.10, 
can you support the claim that the proportion of the U.S. workforce that 
works more than 40 hours per week is less for women than for men? 


Extending Concepts 


Constructing Confidence Intervals for p; — pz You can construct a 
confidence interval for the difference between two population proportions p, — p2 
by using the inequality below. 


Pih . Por a 
+ < < — + z4/—— + == 
my ee (D1 — po) + 2% i, 7 


(Pi — Po) Ze 


In Exercises 23-26, construct the indicated confidence interval for p, — po. 
Assume the samples are random and independent. 


23. Students Planning to Study Visual and Performing Arts In a survey of 
10,000 students taking the SAT, 7% were planning to study visual and 
performing arts in college. In another survey of 8000 students taken 10 years 
before, 9% were planning to study visual and performing arts in college. 
Construct a 95% confidence interval for p; — p2, where p, is the proportion 
from the recent survey and p, is the proportion from the survey taken 
10 years ago. (Adapted from The College Board) 


24. Students Undecided on an Intended College Major In a survey of 
10,000 students taking the SAT, 7% were undecided on an intended college 
major. In another survey of 8000 students taken 10 years before, 3% were 
undecided on an intended college major. Construct a 90% confidence 
interval for p; — po, where p, is the proportion from the recent survey and 
P2 is the proportion from the survey taken 10 years ago. (Adapted from The 
College Board) 


25. Employment In Section 6.3, Exercises 27 and 28, let p,; be the proportion 
of the population of U.S. college graduates who expect to stay at their first 
employer for 3 or more years and let p be the proportion of the population 
of U.S. college graduates who are employed in their field of study. Construct 
a 95% confidence interval for py — p2. Compare your result with the result 
in Section 6.3, Exercise 27, part (a). 


26. Employment Repeat Exercise 25 but with a 99% confidence interval. 
Compare your result with the result in Section 6.3, Exercise 27, part (b). 


AND | Statistics in the Real World 


Uses 


Hypothesis Testing with Two Samples Hypothesis testing enables you 
to determine whether differences in samples indicate actual differences in 
populations or are merely due to sampling error. For instance, a study conducted 
on about 1400 American children in a variety of settings compared the behavior 
of the children who attended day care with the behavior of those who stayed 
home. Aggressive behavior such as stealing toys, pushing other children, and 
starting fights was measured in both groups. The study showed that children 
who attended day care for more than 30 hours per week were about three 
times more likely to be aggressive than those who stayed home. Although the 
aggressive behavior observed in the study was well within the normal range for 
healthy children, these statistics have been used to persuade parents to keep 
their children at home until they start school. 


Abuses 


Confounding Variables The U.S. study found that the results were the same 
regardless of quality of the day care center and income of the family. However, 
the overall quality of care experienced by most of the children studied could be 
the problem—a survey of American day care centers that measured aspects such 
as number and expertise of caregivers found that only 10 percent of American 
day care centers provided high-quality care. 

A similar study of preschoolers and aggressive behavior in Norway, where 
day care centers are subject to strict standards and the ratio of adult caregivers to 
children is high, found that the link between day care attendance and aggressive 
behavior was minimal. Another Norwegian study included an additional 
variable, differences between siblings, and found no relationship between day 
care attendance and behavior problems. These additional variables that are often 
out of the researcher’s control are known as confounding variables. 


Study Funding A series of studies was conducted on various methods for 
reducing the number of cigarettes that smokers smoke. The study compared 
smokers who were simply told to smoke less and those who tried methods such 
as nicotine replacement therapy, electronic cigarettes, and using reduced tar, 
carbon, or nicotine cigarettes. Some methods were shown to be effective in 
reducing the number of cigarettes smoked. 

Some of the studies were funded by the tobacco industry, which could profit 
from promoting strategies other than quitting as beneficial to smokers’ health. 
When dealing with statistics, it is always good to know who is paying for a study, 
and whether the researchers are unbiased. 


EXERCISES 


1. Confounding Variables A pharmaceutical company has applied for approval 
to market a new arthritis medication. The research involved a test group that 
was given the medication and another test group that was given a placebo. 
Describe some possible confounding variables that could influence the results 
of the study. 


2. Medical research often involves blind and double-blind testing. Explain what 
these two terms mean. 


476 CHAPTER 8 Hypothesis Testing with Two Samples 


Chapter Summary 
Review 
What Did You Learn? Example(s) Exercises 
Section 8.1 
» How to determine whether two samples are independent or dependent 1 1-4 
» How to perform a two-sample z-test for the difference between two means 1, 2,3 5-10 
and 2 using independent samples with o, and a» known 
_ (% — Xe) — (hr — be) 
OX, —X, 
Section 8.2 
» How to perform a two-sample t-test for the difference between two means 1, 1,2 11-18 
and yw» using independent samples with a, and o2 unknown 
pe (X% — X2) — (ei — M2) 
Sx, —X, 
Section 8.3 
» How to perform a t-test to test the mean of the differences for a population of 1,2 19-24 
paired data 
p= d— wa 
Sq/Vn 
Section 8.4 
» How to perform a two-sample z-test for the difference between two Wane. 25-30 
population proportions p, and p> 
_ (61 — be) — (Pi — Pa) 
odeea) 
pq mm 
Two-Sample Hypothesis Testing for Population Means 
Are the samples Re eka Cannot use hypothesis tests 
independent? of data pairs at least 30? discussed in this chapter. 


Gy 


Use t-test for dependent 
samples (Section 8.3). 


Are both populations Are both population 
normal or are both Ye, standard deviations Ye Use z-test (Section 8.1). 
sample sizes at least 30? known? 


& = 


Cannot use hypothesis tests Use t-test for independent 
discussed in this chapter. samples (Section 8.2). 


477 


478 CHAPTER 8 Hypothesis Testing with Two Samples 


8 Review Exercises 


Section 8.1 


In Exercises 1—4, classify the two samples as independent or dependent and justify 
your answer. 


1. 


2. 


3. 


Sample 1: The weights of 43 adults 
Sample 2: The weights of the same 43 adults after participating in a diet 
and exercise program 


Sample 1: The weights of 39 dogs 
Sample 2: The weights of 39 cats 


Sample 1: The fuel efficiencies of 20 sports utility vehicles 
Sample 2: The fuel efficiencies of 20 minivans 


4. Sample 1: The fuel efficiencies of 12 cars 


Sample 2: The fuel efficiencies of the same 12 cars using an alternative fuel 


In Exercises 5—8, test the claim about the difference between two population means 
by and py at the level of significance a. Assume the samples are random and 
independent, and the populations are normally distributed. 


5. 


Claim: wy, = po; a = 0.05 
Population statistics: 0, = 0.30 and a, = 0.23 
Sample statistics: x, = 1.28,n, = 96 and xX, = 1.34,n, = 85 


. Claim: wy = p23; a = 0.01 


Population statistics: 0, = 52 and oy = 68 
Sample statistics: x, = 5595,n, = 156 and xX, = 5575, nz = 216 


. Claim: wy < po; a = 0.10 


Population statistics: 0, = 0.11 and op = 0.10 
Sample statistics: x, = 0.28,n, = 41 and xX, = 0.33, n) = 34 


. Claim: wy, A~ by; a = 0.05 


Population statistics: 0, = 14 and a, = 15 
Sample statistics: x, = 87,n, = 410 and x, = 85,n, = 340 


In Exercises 9 and 10, (a) identify the claim and state Hj and H,, (b) find the 
critical value(s) and identify the rejection region(s), (c) find the standardized test 
Statistic z, (d) decide whether to reject or fail to reject the null hypothesis, and 
(e) interpret the decision in the context of the original claim. Assume the samples 
are random and independent, and the populations are normally distributed. 


9. 


10. 


A researcher claims that the mean sodium content of sandwiches at 
Restaurant A is less than the mean sodium content of sandwiches at 
Restaurant B. The mean sodium content of 22 randomly selected sandwiches 
at Restaurant A is 670 milligrams. Assume the population standard deviation 
is 20 milligrams. The mean sodium content of 28 randomly selected sandwiches 
at Restaurant B is 690 milligrams. Assume the population standard deviation 
is 30 milligrams. At a = 0.05, is there enough evidence to support the claim? 


A career counselor claims that the mean annual salary of entry-level 
paralegals in Peoria, Illinois, and Gary, Indiana, is the same. The mean annual 
salary of 40 randomly selected entry-level paralegals in Peoria is $50,410. 
Assume the population standard deviation is $9320. The mean annual salary 
of 35 randomly selected entry-level paralegals in Gary is $47,350. Assume 
the population standard deviation is $9330. At a = 0.10, is there enough 
evidence to reject the counselor’s claim? (Adapted from Salary.com) 


Review Exercises 479 


Section 8.2 


In Exercises 11-16, test the claim about the difference between two population 
means [11 and py at the level of significance a. Assume the samples are random and 
independent, and the populations are normally distributed. 


11. Claim: w; = po; a = 0.05. Assume of = 03 
Sample statistics: ¥; = 228, 5; = 27,n, = 20 and 
X2 = 207, 52 = 25, Nz = 13 
12. Claim: w,; < py; a = 0.10. Assume of # 05 
Sample statistics: x, = 0.015, 5; = 0.011, 1, = 8 and 
X, = 0.019, sy = 0.004, n. = 6 


13. Claim: yw; = po; a = 0.10. Assume of ¥ 0% 
Sample statistics: ¥; = 664.5, 5; = 2.4,n, = 40 and 
X2 = 665.5, So = 4.1, n> = 40 


14. Claim: pw; = po; a = 0.01. Assume of = 03 
Sample statistics: ¥; = 44.5, 5; = 5.85,n, = 17 and 
X2 = 49.1, sz = 5.25, n> = 18 


15. Claim: wy, ~ py; a = 0.01. Assume of = 05 
Sample statistics: ¥; = 61, s; = 3.3,n, = 5 and 
X, = 55,8. = 1.2,n, =7 


16. Claim: 4; > po; a = 0.10. Assume of ¥ 0 
Sample statistics: ¥; = 520, 5; = 25,n, = 7 and 
X> — 500, 52 = 55, Nz =6 


In Exercises 17 and 18, (a) identify the claim and state Hy and H,, (b) find the 
critical value(s) and identify the rejection region(s), (c) find the standardized test 
statistic t, (d) decide whether to reject or fail to reject the null hypothesis, and 
(e) interpret the decision in the context of the original claim. Assume the samples 
are random and independent, and the populations are normally distributed. 


as} 17. A new method of teaching mathematics is being tested on sixth grade 
students. A group of sixth grade students is taught using the new 
curriculum. A control group of sixth grade students is taught using the 
old curriculum. The mathematics test scores for the two groups are 
shown in the back-to-back stem-and-leaf plot. 


Old Curriculum New Curriculum 
458] 0 
01157) 1 
16)/2/24577 
0128{;3 |47 
0269/}4)2567 
1349;)5 |157 
07/61)235667 
3334468/7|)002556 
19} 8 |/23669 
444;}9/);01468 


Key: 6|2|2 = 26 for old curriculum and 
22 for new curriculum 


At a = 0.05, is there enough evidence to support the claim that the 
new method of teaching mathematics produces higher mathematics 
test scores than the old method does? Assume the population variances 
are equal. 


480 


CHAPTER 8 Hypothesis Testing with Two Samples 


18. A real estate agent claims that there is no difference between the mean 
household incomes of two neighborhoods. The mean income of 12 randomly 
selected households from the first neighborhood is $52,750 with a standard 
deviation of $2900. In the second neighborhood, 10 randomly selected 
households have a mean income of $51,200 with a standard deviation of 
$2225. At a = 0.01, can you reject the real estate agent’s claim? Assume the 
population variances are equal. 


Section 8.3 


In Exercises 19-22, test the claim about the mean of the differences for a population 
of paired data at the level of significance a. Assume the samples are random and 
dependent, and the populations are normally distributed. 


19. Claim: wg = 0; a = 0.01. Sample statistics: d = 8.5, sg = 10.7,n = 16 

20. Claim: wg < 0;a = 0.10. Sample statistics: d = 3.2, sy = 5.68, n = 25 

21. Claim: wq = 0;a = 0.10. Sample statistics: d = 10.3, sg = 18.19, n = 33 

22. Claim: wg ~ 0;a@ = 0.05. Sample statistics: d = 17.5, sq = 4.05,n = 37 

In Exercises 23 and 24, (a) identify the claim and state Hy and H,, (b) find the 
critical value(s) and identify the rejection region(s), (c) calculate d and sq, (d) find 
the standardized test statistic t, (e) decide whether to reject or fail to reject the 
null hypothesis, and (f) interpret the decision in the context of the original claim. 


Assume the samples are random and dependent, and the populations are normally 
distributed. 


Be 23. A sports statistician claims that the numbers of passing yards for 
college football quarterbacks change from their junior to their senior 
years. The table shows the numbers of passing yards for 10 college 
football quarterbacks in their junior and senior years. At a = 0.05, 
is there enough evidence to support the sports statistician’s claim? 
(Source: Sports Reference, LLC) 

Player 1 2 ) + 5 
Passing yards (junior year) 2517 | 2291 | 3853 | 2827 | 2701 


Passing yards (senior year) 2184 | 2946 | 3540 3557 | 2169 


Player 6 7 8 9 10 
Passing yards (junior year) 3145 | 4332 | 1001 2401 = 1984 
Passing yards (senior year) 3328 | 3348 | 1464 | 2366 = 2273 


24. A physical fitness instructor claims that a weight loss supplement will help 
users lose weight after two weeks. The table shows the weights (in pounds) 
of 9 adults before using the supplement and two weeks after using the 
supplement. At a = 0.10, is there enough evidence to support the physical 
fitness instructor’s claim? 


User i 2 3 4 5 6 7 8 9 
Weight (before) 228 210 | 245 272 203 198 | 256 | 217 | 240 
Weight (after) 225 | 208 | 242 | 270 | 205 | 196 | 250 | 220 | 240 


Review Exercises 481 


Section 8.4 


In Exercises 25-28, determine whether a normal sampling distribution can be 
used. If it can be used, test the claim about the difference between two population 
proportions p, and pz at the level of significance a. Assume the samples are random 
and independent. 


25. Claim: p; = p2; a = 0.05 
Sample statistics: x; = 425,n, = 840 and x2 = 410, 2 = 760 


26. Claim: p; S pz; a = 0.01 
Sample statistics: x; = 36,7, = 100 and x, = 46,n, = 200 


27. Claim: p; > po; a = 0.10 
Sample statistics: x; = 261,, = 556 and x7 = 207, n2 = 483 


28. Claim: p; < pr; a = 0.05 
Sample statistics: x; = 86,7, = 900 and x, = 107, n2 = 1200 


In Exercises 29 and 30, (a) identify the claim and state Hy and H,, (b) find the 
critical value(s) and identify the rejection region(s), (c) find the standardized test 
statistic z, (d) decide whether to reject or fail to reject the null hypothesis, and 
(e) interpret the decision in the context of the original claim. Assume the samples 
are random and independent. 


29. A medical research team conducted a study to test the effect of a drug 
used to treat a type of inflammation. In the study, 68 subjects took the drug 
and 68 subjects took a placebo. The results are shown below. At a = 0.05, 
can you reject the claim that the proportion of subjects who had at least 
24 weeks of accrued remission is the same for the two groups? (Source: The 
New England Journal of Medicine) 


Do You Have At Least 24 Weeks of Accrued Remission? 
Yes 


oe 


Drug Placebo 


30. A traffic safety research team conducted a survey over two years on the 
use of motorcycle helmets. In the survey, each year 1000 motorcyclists 
were asked whether they use helmets that are compliant with federal safety 
regulations. The results are shown below. At a = 0.01, can you support the 
claim that the proportion of motorcyclists who use such helmets increased 
from the first year to the second year? (Adapted from National Highway Traffic 
Safety Administration) 


Do You Use Helmets that Are Compliant 
with Federal Saftey Regulations? 


First year Second year 


482 


CHAPTER 8 Hypothesis Testing with Two Samples 


8 Chapter Quiz 


Take this quiz as you would take a quiz in class. After you are done, check your 
work against the answers given in the back of the book. 


For each exercise, perform the steps below. 


(a) Identify the claim and state Hy and H,. 


(b) Determine whether the hypothesis test is left-tailed, right-tailed, or two-tailed, 
and whether to use a z-test or a t-test. Explain your reasoning. 


(c) Find the critical value(s) and identify the rejection region(s). 
(d) Find the appropriate standardized test statistic. 
(e) Decide whether to reject or fail to reject the null hypothesis. 


(f) Interpret the decision in the context of the original claim. 


1. The mean score on a reading assessment test for 49 randomly selected male 
high school students was 279. Assume the population standard deviation is 
41. The mean score on the same test for 50 randomly selected female high 
school students was 278. Assume the population standard deviation is 39. 
At a = 0.05, can you support the claim that the mean score on the reading 
assessment test for male high school students is greater than the mean score 
for female high school students? (Adapted from National Center for Education 
Statistics) 


2. A music teacher claims that the mean scores on a music assessment test 
for eighth grade boys and girls are equal. The mean score for 13 randomly 
selected boys is 142 with a standard deviation of 49, and the mean score for 
15 randomly selected girls is 156 with a standard deviation of 42. At a = 0.1, 
can you reject the teacher’s claim? Assume the populations are normally 
distributed and the population variances are equal. (Adapted from National 
Center for Education Statistics) 


Be 3. The table shows the credit scores for 12 randomly selected adults who are 
considered high-risk borrowers before and two years after they attend a 
personal finance seminar. At a = 0.01, is there enough evidence to 
support the claim that the personal finance seminar helps adults increase 
their credit scores? Assume the populations are normally distributed. 


Adult 1 2 3 4 5 6 
Credit score (before seminar) 608 620 610 650 640 680 
Credit score (after seminar) 646 692 | 715 669 | 725 | 786 


Adult 7 8 9 10 11 12 
Credit score (before seminar) 655 602 644 656 632 = 664 
Credit score (after seminar) 700 650 660 | 650 680 | 702 


4. In a random sample of 1020 U.S. adults in a recent year, 459 approve of the 
job the Supreme Court is doing. In another random sample of 1510 U.S. 
adults taken 3 years prior, 694 approve of the job the Supreme Court is doing. 
At a = 0.05, can you support the claim that the proportion of U.S. adults 
who approve of the job the Supreme Court is doing is less than it was 3 years 
prior? (Adapted from Gallup) 


8 Chapter Test 


Chapter Test 483 


Take this test as you would take a test in class. 


For each exercise, perform the steps below. 


(a) Identify the claim and state Hy and H,. 
(b) Determine whether the hypothesis test is left-tailed, right-tailed, or two-tailed, 


and whether to use a z-test or a t-test. Explain your reasoning. 


(c) Find the critical value(s) and identify the rejection region(s). 


(d) Find the appropriate standardized test statistic. 


(e) Decide whether to reject or fail to reject the null hypothesis. 


(f) Interpret the decision in the context of the original claim. 


1. 


In a survey of 5000 students taking the SAT, 350 were undecided on an 
intended college major. In another survey of 12,000 students taken 10 years 
before, 360 were undecided on an intended college major. At a = 0.10, can 
you reject the claim that the proportion of students taking the SAT who are 
undecided on an intended college major has not changed? (Adapted from The 
College Board) 


. A real estate agency says that the mean home sales price in Olathe, Kansas, 


is greater than in Rolla, Missouri. The mean home sales price for 64 homes 
in Olathe is $356,889. Assume the population standard deviation is $537,407. 
The mean home sales price for 36 homes in Rolla is $189,389. Assume the 
population standard deviation is $113,555. At a = 0.05, is there enough 
evidence to support the agency’s claim? (Adapted from RealtyTrac) 


eB 3. A physical therapist suggests that soft tissue massage therapy helps to 


4. 


reduce the lengths of time patients suffer from headaches. The table shows 
the numbers of hours per day 18 patients suffered from headaches before 
and after 6 weeks of receiving treatment. At a = 0.05, is there enough 
evidence to support the therapist’s claim? Assume the populations are 
normally distributed. (Adapted from Annals of Musculoskeletal Medicine) 


Patient 1 2 3 4 5 6 7 8 9 
Hours (before) | 5.2) 5.1 | 49 > 16) 61 | 23 46) 5.2 3.1 
Hours (after) 3.5 | 3.3 | 3.7 | 2.3 | 2.7 | 2.4 | 2.1 | 2.5 | 2.8 


Patient 10 | 11 | 12 | 13 | 14) 15 | 16 | 17 | 18 
Hours (before) 4.4 4.2 | 54 | 3.3 > 52 | 3.7 26 | 2.7 | 2.6 
Hours (after) 41 | 3.0 | 2.4 | 2.4 | 2.7 | 2.6 | 2.4 | 2.7 | 2.4 


A demographics researcher claims that the mean household income in a 
recent year is different in Polk County, Iowa, than it is in Woodward County, 
Oklahoma. In Polk County, a sample of 13 residents has a mean household 
income of $61,300 and a standard deviation of $1770. In Woodward County, 
a sample of 15 residents has a mean household income of $59,800 and a 
standard deviation of $8350. At a = 0.01, can you support the demographics 
researcher’s claim? Assume the populations are normally distributed and the 
population variances are not equal. (Adapted from U.S. Census Bureau) 


Putting it all together 


REAL DECISIONS 


The U.S. Department of Health & Human Services (HHS) is a 
department of the U.S. federal government with the motto “Improving 
the health, safety, and well-being of America.” The Centers for 
Medicare & Medicaid Services work within the HHS to help administer 
Medicare, Medicaid, and other health programs. They also gather 
information about health expenditure, program utilization, and other 
data. One area studied is the average amount of time that Medicare 
patients spend at short-stay hospitals. 

You work for the Centers for Medicare & Medicaid Services. You 
want to test the claim that the mean length of stay for inpatients in 2015 
is different than what it was in 2000 from a random sample of inpatient 
records. The results for several inpatients from 2000 and 2015 are shown 
in the histograms. 


EXERCISES 


1. How Could You Do It? 


Explain how you could use each sampling technique to select the 
sample for the study. 


(a) stratified sample 
(b) cluster sample 
(c) systematic sample 
(d) simple random sample 
2. Choosing a Sampling Technique 


(a) Which sampling technique in Exercise 1 would you choose to 
implement for the study? Why? 

(b) Identify possible flaws or biases in your study. 

3. Choosing a Test 
To test the claim that there is a difference in the mean length of 
hospital stays, should you use a z-test or a t-test? Are the samples 
independent or dependent? Do you need to know what each 
population’s distribution is? Do you need to know anything about the 
population variances? 

4. Testing a Mean 
Test the claim that there is a difference in the mean length of hospital 
stays for inpatients. Assume the populations are normal and the 
population variances are equal. Use a = 0.05. Interpret the test’s 
decision. Does the decision support the claim? 


484 CHAPTER 8 Hypothesis Testing with Two Samples 


Frequency 


Frequency 


Inpatients Length of Stay 


(2000) 
i 
Nn 
ot 
8} X,=6 
a 8, = 1.63 
6 
sin, =28 d 
4+ | 
3-- = 4 
2+ ios 
if L | 
12345678 
Length of stay (in days) 
Inpatients Length of Stay 
f (2015) 
A 


X= 5.23 i 
Sy = L81 
|| 24 = 30 


12 3 4 5 67 8 9 
Length of stay (in days) 


TECHNOLOGY 


Tails Over Heads 


In the article “Tails over Heads” in the Washington 

Post (Oct. 13, 1996), journalist William Casey describes f 
one of his hobbies— keeping track of every coin he finds 
on the street! From January 1, 1985 until the article was 
written, Casey found 11,902 coins. 

As each coin is found, Casey records the time, date, 
location, value, mint location, and whether the coin is 
lying heads up or tails up. In the article, Casey notes 
that 6130 coins were found tails up and 5772 were found 
heads up. Of the 11,902 coins found, 43 were minted in 
San Francisco, 7133 were minted in Philadelphia, and 
4726 were minted in Denver. 

A simulation of Casey’s experiment can be done in 
Minitab as shown below. A frequency histogram of one 
simulation’s results is shown at the right. 


Sample From Columns... | MINITAB | 


Chi-Square... 
Normal... Number of rows of data to generate: 500 


Multivariate Normal... 


Coin Toss Simulation 


Frequency 


Store in column(s): C1 


ct |™ 


b.. Number of trials: 11902 
Uniform... 


Beepeniin Event probability: .5 


Binomial... 


Geometric... 


1. Use technology to perform a one-sample z-test In Exercises 4 and 5, use technology to perform a 
to test the hypothesis that the proportion of coins two-sample t-test to determine whether there is a difference 
found lying heads up is 0.5. Use a = 0.01. Use Casey’s in the mint dates and in the values of coins found on a 
data as your sample and write your conclusion as a street from 1985 through 1996 for the two mint locations. 
sentence. Write your conclusion as a sentence. Use a = 0.05. 

2. Do Casey’s data differ significantly from chance? If so, 4, Mint dates of coins (years) 
what might be the reason? Philadelphia: ¥; = 1984.8 5; = 8.6 

3. In the simulation shown above, what percent of the Denver: X. = 19834 = 84 
trials had heads less than or equal to the number of Assume population variances are equal. 
heads in Casey’s data? Use technology to repeat the 5. Value of coins (dollars) 
simulation. Are your results comparable? Philadelphia: x, = $0.034 5, = $0.054 

Denver: X = $0.033 Sy = $0.052 


Assume population variances are not equal. 


Extended solutions are given in the technology manuals that accompany this text. 
Technical instruction is provided for Minitab, Excel, and the TI-84 Plus. 


Technology 485 


8 Using Technology to Perform Iwo-Sample Hypothesis Tests 


Here are some Minitab and TI-84 Plus printouts for several examples in this chapter. 


Display Descriptive Statistics... 
Store Descriptive Statistics... 
Graphical Summary... 


1-Sample Z... 
41-Sample t... 
2-Sample t... 
Paired t... 


1 Proportion... 
2 Proportions... 


Display Descriptive Statistics... 
Store Descriptive Statistics... 
Graphical Summary... 


1-Sample Z... 
41-Sample t... 


2-Samople t... 


1 Proportion... 
2 Proportions... 


See Example 1, page 452. 


Two-Sample T-Test and Cl 


Sample N Mean StDev SE Mean 
1 8 473.0 GIS). 7 14 
2 18 459.0 24.5 at) 


Difference = mu (1) — mu (2) 

Estimate for difference: 14.0 

90% Cl for difference: (-13.8, 41.8) 

T-Test of difference = O (vs not =): T-Value = 0.92 P-Value = 0.3880 DF=9 


See Example 1, page 461. 
Vertical Jump Heights, Before and After Using Shoes 


Athlete 1 2 3 4 5 6 7 8 


Vertical jump height 


(before using shoes) 24 22 | 25 | 28 | 35 | 32 | 30 27 


Vertical jump height 
(after using shoes) 


Paired T-Test and Cl: Before, After 


26 | 25 | 25 | 29 | 33 | 34 | 35 | 30 


Paired T for Before — After 


N Mean StDev SE Mean 
Before 8 27.88 “Al Bye) | Jas) 
After 8 29.63 4.07 1.44 
Difference 8 =|, /a0) ele O.7/ a0) 


90% upper bound for mean difference: —0.689 
T-Test of mean difference = O (vs < Q): T-Value = —2.33 P-Value = 0.026 


486 CHAPTER 8 Hypothesis Testing with Two Samples 


See Example 2, page 444. 


TI-84 PLUS 


EDIT CALC MiStsiis) 


1: Z-Test:.. 

2: i-Test... 
2—SampZTest... 
4: @—SamptTtlest... 
5: 1\—PropZTest... 
6: 2—PropZTest... 
7 Zinterval... 


TI-84 PLUS 


2-SampZTesti 


Inpt:Data 
o1:960 
o2:845 
x1:3060 
n1:250 
x2:2910 
Vn2:250 


TI-84 PLUS 


2-SampZTesti 


Ao2:845 
x1:3060 
n1:250 
x2:2910 
n2é:250 
U1: <ye2 Spe 
Calculate Draw 


TI-84 PLUS 


2-SampZTesti 


H4#HUe 
z=1.854468212 
p=.0636720795 
xX,=3060 
X>=2910 

Vn, =250 


See Example 2, page 453. 


TI-84 PLUS 


EDIT CALC Mistsams) 


1; ZTest... 

: Test... 
3: 2—SampZTest... 
2—SampT Test... 
1—PropZTest... 
2—PropZTest... 
VY Zinterval... 


5 
6 
u 


TI-84 PLUS 


2-Sampl Test 


Inpt:Data 

x1:.48 

Syalen Os) 

iat] S40) 

Keel 

Sx OW, 
Vne:32 


TI-84 PLUS 


2-Sampl Test 


#n1:30 
Keo 
Sx2:.07 
n2:32 
U1:#y2 >y2 
Pooled:No 
Calculate Draw 


TI-84 PLUS 


2-Sampl Test 


Hi<He 
p=— 1] Oe Oe) 
p=.0291499618 
df=60 
X,=.48 

Vxe— Dil 


See Example 1, page 471. 


TI-84 PLUS 


EDIT CALC IEStsiis) 


1: Z-Test... 

2: T—Test... 

3: 2—SampZTest... 
4: 2—SampTtest... 
5: 1—PropZtlest... 
[A 2—PropZTest... 
7 Zinterval... 


TI-84 PLUS 


f- 


2-PropZTest 


see 

n1:200 

x2:208 

n2:250 

p1: Eq <p2 >p2 
Calculate Draw 


TI-84 PLUS 


A 


2-PropZTest 


P1#Pe2 
z=2.418677324 
p=.0155770453 
p,=.91 
po=.832 

VV p=.8666666667 


Using Technology to Perform Two-Sample Hypothesis Tests 


487 


CHAPTERS 6-8 


488 


1. 


In a survey of 3015 U.S. adults, 80% say their household contains a desktop 
or laptop computer. (Source: Pew Research Center) 


(a) Construct a 95% confidence interval for the proportion of U.S. adults 
who say their household contains a desktop or laptop computer. 


(b) A researcher claims that more than 75% of U.S. adults say their 
household contains a desktop or laptop computer. At a = 0.05, can you 
support the researcher’s claim? Interpret the decision in the context of 
the original claim. 


. Gas Mileage The table shows the gas mileages (in miles per gallon) of eight 


cars with and without using a fuel additive. At a = 0.10, is there enough 
evidence to conclude that the additive improved gas mileage? Assume the 
populations are normally distributed. 

Car 1 2 3 4 

Gas mileage (without fuel additive) 23.1 © 25.4 | 21.9 | 24.3 

Gas mileage (with fuel additive) 23.6 27.7 | 23.6 26.8 


Car 5 6 7 8 
Gas mileage (without fuel additive) 19.9 21.2 25.9 | 248 
Gas mileage (with fuel additive) 22.1 | 22.4 | 26.3 | 26.6 


In Exercises 3-6, construct the indicated confidence interval for the population 
mean yw. Which distribution did you use to create the confidence interval? 


3. 
4. 
5. 
6. 


c = 0.95, x = 26.97, 0 = 3.4,n = 42 
c = 0.95, x = 3.46, 5 = 1.63,n = 16 
c = 0.99, x = 12.1,5 = 2.64,n = 26 
c = 0.90, x = 8.21,0 = 0.62,n = 8 


In Exercises 7-10, the statement represents a claim. Write its complement and state 
which is Hy and which is H,. 


1k 
. p = 0.19 
. « = 0.63 
. bb A 2.28 


je =< 83) 


. A pediatrician claims that the mean birth weight of a single-birth baby is 


greater than the mean birth weight of a baby that has a twin. The mean 
birth weight of a random sample of 85 single-birth babies is 3086 grams. 
Assume the population standard deviation is 563 grams. The mean birth 
weight of a random sample of 68 babies that have a twin is 2263 grams. 
Assume the population standard deviation is 624 grams. At a = 0.10, can 
you support the pediatrician’s claim? Interpret the decision in the context of 
the original claim. 


CHAPTER 8 Hypothesis Testing with Two Samples 


12. 


13. 


The mean room rate for two adults for a random sample of 26 three-star 
hotels in Cincinnati has a sample standard deviation of $31. Assume the 
population is normally distributed. (Adapted from Expedia) 


(a) Construct a 99% confidence interval for the population variance. 


(b) Construct a 99% confidence interval for the population standard 
deviation. 


(c) A travel analyst claims that the standard deviation of the mean room 
rate for two adults at three-star hotels in Cincinnati is at most $30. At 
a = 0.01, can you reject the travel analyst’s claim? Interpret the decision 
in the context of the original claim. 


An education organization claims that the mean SAT scores for male 
athletes and male non-athletes at a college are different. A random sample 
of 26 male athletes at the college has a mean SAT score of 1189 and a 
standard deviation of 218. A random sample of 18 male non-athletes at 
the college has a mean SAT score of 1376 and a standard deviation of 186. 
At a= 0.05, can you support the organization’s claim? Interpret the 
decision in the context of the original claim. Assume the populations are 
normally distributed and the population variances are equal. 


7% 14. The annual earnings (in dollars) for 30 randomly selected locksmiths 


15. 


16. 


17. 


are shown below. Assume the population is normally distributed. 
(Adapted from Salary.com) 


44,044 42,206 38,262 57,022 66,462 50,211 
64,804 67,191 55,101 64,962 49,634 47,516 
61,710 43,514 60,622 30,600 56,477 60,747 
54,275 54,266 54,367 48,420 40,549 65,291 
64,842 52,435 49,179 48,042 63,648 49,142 


(a) Construct a 95% confidence interval for the population mean 
annual earnings for locksmiths. 


(b) A researcher claims that the mean annual earnings for locksmiths 
is $53,000. At a = 0.05, can you reject the researcher’s claim? 
Interpret the decision in the context of the original claim. 


A medical research team studied the use of a marijuana extract to treat 
children with an epilepsy disorder. Of the 52 children who were given 
the extract, the number of convulsive seizures was reduced from 12 to 6 
per month. Of the 56 children who were given a placebo, the number of 
convulsive seizures was reduced from 15 to 14 per month. At a = 0.10, can 
you support the claim that the proportion of monthly convulsive seizure 
reduction is greater for the group that received the extract than for the 
group that received the placebo? Interpret the decision in the context of the 
original claim. (Adapted from the New England Journal of Medicine) 


A random sample of 40 ostrich eggs has a mean incubation period of 42 days. 
Assume the population standard deviation is 1.6 days. 


(a) Construct a 95% confidence interval for the population mean incubation 
period. 


(b) A zoologist claims that the mean incubation period for ostriches is at 
least 45 days. At a = 0.05, can you reject the zoologist’s claim? Interpret 
the decision in the context of the original claim. 


A researcher claims that 5% of people who wear eyeglasses purchase their 
eyeglasses online. Describe type I and type II errors for a hypothesis test of 
the claim. (Source: Consumer Reports) 


Cumulative Review 489 


CHAPTER 9 


Correlation 


Activity 


Linear Regression 


Activity 
Case Study 


Measures of Regression and 
Prediction Intervals 


Multiple Regression 


Uses and Abuses 
Real Statistics—Real Decisions 
Technology 


oO pve 


In 2016, the Los Angeles Dodgers had the highest team salary in Major League Baseball 
at $231.3 million and the highest average attendance at 45,720. The Tampa Bay Rays had 
the lowest team salary at $48.2 million and the lowest average attendance at 15,879. 


490 


J Where You ve Been 


In Chapters 1-8, you studied descriptive statistics, 
probability, and inferential statistics. One of the techniques : 


Major League Baseball 


you learned in descriptive statistics was graphing paired 2 50,000 + 
data with a scatter plot (see Section 2.2). For instance, the 5 g iawn! op . 
salaries and average attendances at home games for the 5 se ee 
teams in Major League Baseball in 2016 are shown in the 2 8 spnne ll me m “ 
scatter plot at the right and in the table below. 5 : © 
2 2. 10,000 + 
}—+—+—+—}-> 
50 100 150 200 250 
Salary (in millions of dollars) 
Salary (in millions of dollars) 78.4 75.0 153.7. 218.7. 176.1 | 113.4 773 94.5 89.7 199.9 
Average attendance per home game = 25,138 | 24,950 26,819 | 36,487 39,906 | 21,559 | 23,384 | 19,650 32,130 | 31,173 
Salary (in millions of dollars) 89.5 125.1 139.7 | 231.3 72.5 52.1 93.3 155.2 193.2 55.0 
Average attendance per home game = 28,477 | 31,577 | 37,236 | 45,720 | 21,405 28,575 | 24,246 34,440 37,820 | 18,784 
Salary (in millions of dollars) 84.8 81.2 50.7 137.2. 177.0 | 150.4 48.2 212.1 182.7 153.0 
Average attendance per home game 23,644 27,768 29,030 | 27,999 41,546 | 42,525 15,879 | 33,462 41,878 | 30,641 


Ly, Where You re Going 


In this chapter, you will study how to describe and test the 
significance of relationships between two variables when 
data are presented as ordered pairs. For instance, in the 
scatter plot above, it appears that higher team salaries 
tend to correspond to higher average attendances and 
lower team salaries tend to correspond to lower average 
attendances. This relationship is described by saying that 
the team salaries are positively correlated to the average 


attendances. Graphically, the relationship can be described 
by drawing a line, called a regression line, that fits the 
points as closely as possible, as shown below. The second 
scatter plot below shows the salaries and wins for the 
teams in Major League Baseball in 2016. From the scatter 
plot, it appears that there is a positive correlation between 
the team salaries and wins. 


Major League Baseball Major League Baseball 

of 
8 A 
z 105 —s 
z 95 oy 

e 
a {| et ey * 
=| ao 85 e 
=I e e 

5 6. ew : 
< St ° 


50 100 150 200 250 


Salary (in millions of dollars) 


f } | | | io yy 


T T T 
50 100. «150 =6.200 250 


Salary (in millions of dollars) 


491 


492 CHAPTER 9 Correlation and Regression 


What You Should Learn 


» An introduction to linear 
correlation, independent and 
dependent variables, and the 
types of correlation 


» How to find a correlation 
coefficient 


~ How to test a population 
correlation coefficient p using a 
table 

~ How to perform a hypothesis 
test for a population correlation 
coefficient p 

~ How to distinguish between 
correlation and causation 


Correlation 


An Overview of Correlation = Correlation Coefficient m Using aTable 
to Test a Population Correlation Coefficient p m Hypothesis Testing for a 
Population Correlation Coefficient p m= Correlation and Causation 


An Overview of Correlation 


Suppose a safety inspector wants to determine whether a relationship exists 
between the number of hours of training for an employee and the number of 
accidents involving that employee. Or suppose a psychologist wants to know 
whether a relationship exists between the number of hours a person sleeps each 
night and that person’s reaction time. How would he or she determine if any 
relationship exists? 

In this section, you will study how to describe what type of relationship, or 
correlation, exists between two quantitative variables and how to determine 
whether the correlation is significant. 


DEFINITION 


A correlation is a relationship between two variables. The data can be 
represented by the ordered pairs (x,y), where x is the independent (or 


explanatory) variable and y is the dependent (or response) variable. 


In Section 2.2, you learned that the graph of ordered pairs (x, y) is called 
a scatter plot. In a scatter plot, the ordered pairs (x, y) are graphed as points in 
a coordinate plane. The independent (explanatory) variable x is measured on 
the horizontal axis, and the dependent (response) variable y is measured on the 
vertical axis. A scatter plot can be used to determine whether a linear (straight 
line) correlation exists between two variables. The scatter plots below show 
several types of correlation. 


y 
y As x increases, A 
te y tends to °° 
-% decrease. “aie 
e: 3 
eee ee A 
o fe, %e%o?, As x increases, 
e *% e o M50 
* y tends to 
e e . 
: o oo increase. 
e® e . e 
ee co >x 
—_—_—_—_ CC — IX 


Negative Linear Correlation Positive Linear Correlation 


< 
>< 


ee : s..° 229%, ° 
e e L 
ae 3s om. 
e 4 ef ; s eck” f, 
e e e e 
Bae ee se" ie “= 
a >x Ais Bie 


No Correlation Nonlinear Correlation 


GDP CO, emissions 
(in trillions of — (in millions of 
dollars), x metric tons), y 


1.8 604.4 
1.3 434.2 
2.4 544.0 
1.5 370.4 
3.9 742.3 
2.1 340.5 
0.9 232.0 
1.4 262.3 
3.0 441.9 
4.6 1157.7 


Tech Tip 


Remember that all 
data sets containing 
20 or more entries are 
available electronically. 
Also, some of the data 
sets in this section are 
used throughout the chapter, 

so save any data that you enter. 

For instance, the data used in 
Example 1 is also used later in this 
section and in Sections 9.2 and 9.3. 


vo 
is) 
s 
iad 
vo 
> 
aS 
= 
‘B 207 
a 
oO 
uo) 
g 
eo) 


x 
2 4 6 8 10 12 14 16 18 


Hours of exercise 


SECTION 9.1. Correlation 493 


Constructing a Scatter Plot 


An economist wants to determine whether there is a linear relationship 
between a country’s gross domestic product (GDP) and carbon dioxide (CO2) 
emissions. The data are shown in the table at the left. Display the data in 
a scatter plot and describe the type of correlation. (Source: World Bank and 
U.S. Energy Information Administration) 


SOLUTION 


The scatter plot is shown below. From the scatter plot, it appears that there is 
a positive linear correlation between the variables. 


1200 + . 
1000 + 
800 -+ 
600 -+ ° 
400 + *. ® 


i 
T 

1 2 3 4 5 
GDP (in trillions of dollars) 


CO emissions 
(in millions of metric tons) 


> xX 


Interpretation Reading from left to right, as the gross domestic products 
increase, the carbon dioxide emissions tend to increase. 


TRY IT YOURSELF 1 


A director of alumni affairs at a small college wants to determine whether 
there is a linear relationship between the number of years alumni have been 
out of school and their annual contributions (in thousands of dollars). The data 
are shown in the table below. Display the data in a scatter plot and describe 
the type of correlation. 


Number of years 
out of school, x 


= 
—_ 
oO 
Nn 
_ 
Nn 
io) 
N 
aNG 
io?) 
oO 


Annual contribution 


(in 1000s of $), y 125 87 | 146 5.2 9.9 3:1 2.7 


Answer: Page A38 


Constructing a Scatter Plot 


A student conducts a study to determine whether there is a linear relationship 
between the number of hours a student exercises each week and the student’s 
grade point average (GPA). The data are shown in the table below. Display 
the data in a scatter plot and describe the type of correlation. 


Hours of exercise, x 12 3 0 6 10 2. 18 | 14 | 15 5 
GPA, y 3.6 | 4.0 | 3.9 | 2.5 | 2.4 | 2.2 | 3.7 | 3.0 | 1.8 | 3.1 


SOLUTION 


The scatter plot is shown at the left. From the scatter plot, it appears that there 
is no linear correlation between the variables. 


Interpretation The number of hours a student exercises each week does not 
appear to be related to the student’s grade point average. 


494 


Duration, Time, 
x J 
1.80 56 
1.82 58 
1.90 62 
1.93 56 
1.98 a7 
2.05 57 
2.13 60 
2.30 57 
2.37 61 
2.82 73 
3.13 76 
3.27 71 
3.65 77 


CHAPTER 9 Correlation and Regression 


Duration, 


x 
3.78 
3.83 
3.88 
4.10 
4.27 
4.30 
4.43 
4.47 
4.53 
4.55 
4.60 
4.63 


Time, 


a 
79 
85 
80 
89 
90 
89 
89 
86 
89 
86 
92 
91 


TRY IT YOURSELF 2 
A researcher conducts a study to determine whether there is a linear 
relationship between a person’s height (in inches) and pulse rate (in beats per 
minute). The data are shown in the table below. Display the data in a scatter 
plot and describe the type of correlation. 

Height, x 68 72 65 70 62 75 78 64 68 

Pulse rate, y | 90 85 88 100 105 98 70 65 72 


Answer: Page A38 


Constructing a Scatter Plot Using Technology 


Old Faithful, located in Yellowstone National Park, is the world’s most famous 
geyser. The durations (in minutes) of several of Old Faithful’s eruptions and 
the times (in minutes) until the next eruption are shown in the table at the 
left. Use technology to display the data in a scatter plot. Describe the type of 
correlation. 


SOLUTION 

MINITAB, Excel, the TI-84 Plus, and StatCrunch each have features for 
graphing scatter plots. Try using this technology to draw the scatter plots 
shown. From the scatter plots, it appears that the variables have a positive 
linear correlation. 


Time (in minutes) 
N 
(a) 
| 


— 
0 
o 
2 
3 
a= 
& 
& 
= 
® 
£ 
Ee 


STATCRUNCH 


[Time (in minutes) 


|S0; e 


|BO- e 


\70: 


60: e 


2 2.5 4.5 


Dufation (if'rRinutes) 


Interpretation Reading from left to right, as the durations of the eruptions 
increase, the times until the next eruption tend to increase. 


TRY IT YOURSELF 3 


Consider the data on page 491 on the salaries and average attendances at home 
games for the teams in Major League Baseball. Use technology to display the 
data in a scatter plot. Describe the type of correlation. 

Answer: Page A38 


SECTION 9.1. Correlation 495 


Correlation Coefficient 


Interpreting correlation using a scatter plot is subjective. A precise measure of 
the type and strength of a linear correlation between two variables is to calculate 
the correlation coefficient. A formula for the sample correlation coefficient is 
given, but it is more convenient to use technology to calculate this value. 


’ DEFINITION 
| Study Tip 

The formal name for r 

is the Pearson product 
moment correlation 
coefficient. It is named 
after the English statistician 
Karl Pearson (1857-1936). 
(See page 57) 


The correlation coefficient is a measure of the strength and the direction 
of a linear relationship between two variables. The symbol r represents the 
sample correlation coefficient. A formula for r is 


- n>xy — (x) (Sy) 
Vadx2 — (3x)? Vndy* — (Sy)? 


r Sample correlation coefficient 


where nis the number of pairs of data. The population correlation coefficient 
is represented by p (the lowercase Greek letter rho, pronounced “row”). 


The range of the correlation coefficient is —1 to 1, inclusive. When x and y 
have a strong positive linear correlation, r is close to 1. When x and y have a 
strong negative linear correlation, r is close to —1. When x and y have perfect 
positive linear correlation or perfect negative linear correlation, r is equal to 1 
or —1, respectively. When there is no linear correlation, r is close to 0. It is 
important to remember that when r is close to 0, it does not mean that there is 
no relation between x and y, just that there is no /inear relation. Several examples 
are shown below. 


f 1 == 160 ° 
ex ee A =F - : 
ae 13-- a ES 140 + ; e 
| fal e e 
ae . 12> — 23 120-4 se 
o 
= 40+ — § u+ ° 2 = 10+ —° —~ 
ZS 30+ . 9 10+ nea 28 3048 e 
2 8 ee go e 
6 20+ ) ea = 78 27> 60+ Pa 
8 10 zs 8; ES got : 
a) ye < 
0 gat easy 7 —+++++4+>« 
12345678 Wit ttt tes 10 20 30 40 50 6070 
Number of adult Income per year 
movie tickets Height (in inches) (in thousands of dollars) 
Perfect positive correlation Strong positive correlation Weak positive correlation 
r=1 r= 0.81 r= 0.45 
y ba y 
i" A 
100-4 100-¢-8 a 
2 ° 0+. 2 70> ° 
6 9+ *—. o - e ) 
9 e us} . e B 68+ e 
2 go+ °. = 807—e® a z 
g a ae & 66-- ° 
me 70-- B01 H = o4t e 
A ot 60+ s 1 gy | ° ° 
e oO e ee 
so-+ 50-b Z 60+ ° 
poppoitp ty x t+ > x Ty ty 
12345678 12345678 98 102 106 
Number incorrect Number of absences 1Q score 
Perfect negative correlation Strong negative correlation No correlation 
r=-1 r= —0.92 r= 0.04 


To use a correlation coefficient r to make an inference about a population, 
it is required that (1) the sample paired data (x, y) is random and (2) x and y 
have a bivariate normal distribution (you will learn more about this distribution 
in Section 9.3). In this text, unless stated otherwise, you can assume that these 
requirements are met. 


496 CHAPTER 9 Correlation and Regression 


GUIDELINES 


Calculating a Correlation Coefficient 


Study Tip 


The symbol =x? means 
square each value and add 
the squares. The symbol 
(=x)? means add the 
values and square the sum. 


In Words In Symbols 
. Find the sum of the x-values. 


2. Find the sum of the y-values. 


. Multiply each x-value by its corresponding 
y-value and find the sum. 


. Square each x-value and find the sum. 
. Square each y-value and find the sum. 


. Use these five sums to calculate ndxy — (Sx)(Xy) 


the correlation coefficient. r= 
Vndx2 — (Sx)?Vndy? - (Sy)? 


Calculating a Correlation Coefficient 


Calculate the correlation coefficient for the gross domestic products and carbon 
dioxide emissions data in Example 1. Interpret the result in the context of the data. 


SOLUTION Use a table to help calculate the correlation coefficient. 


GDP (in trillions CO, emissions (in 
of dollars), x mnillions of metric tons), y xy rt y? 
1.8 604.4 1087.92 3.24 365,299.36 
1.3 434.2 564.46 1.69 188,529.64 
2.4 544.0 1305.6 5.76 295,936 
1.5 370.4 555.6 2.25 137,196.16 
3.9 742.3 2894.97 15.21 551,009.29 
2.1 340.5 715.05 4.41 115,940.25 
0.9 232.0 208.8 0.81 53,824 
1.4 262.3 367.22 1.96 68,801.29 
3.0 441.9 1325.7 9 195,275.61 
4.6 1157.7 5325.42 21.16 1,340,269.29 
Xx = 22.9 Ly = 5129.7 Yxy = 14,350.74 | Xx? = 65.49 | Ly? = 3,312,080.89 


With these sums and n = 10, the correlation coefficient is 


n&xy — (2x)(2y) 


NS = (Ea ay = (S02 
7 10(14,350.74) — (22.9) (5129.7) 
V'10(65.49) — (22.9)?V/10(3,312,080.89) — (5129.7)? 
» Study Tip 26,037.27 
oe en r V 130.49 V 6,806,986.81 
in Example 4 is rounded = (0.874. Round to three decimal places. 


to three decimal places. 
This round-off rule will be 
used throughout the text. Interpretation As the gross domestic product increases, the carbon dioxide 
emissions tend to increase. 


The result r ~ 0.874 suggests a strong positive linear correlation. 


Number of Annual 
years out of contribution 
school, x (in 1000s of $), y 
a 12:5 
10 8.7 
5 14.6 
15 5.2 
3 9.9 
24 3.1 
30 25 


To explore this topic further, 
see Activity 9.1 on page 507. 


Tech Tip 


Before using the 

TI-84 Plus to calculate 
r, make sure the 
diagnostics feature is 
on. To turn on this 

= feature, from the home 
screen, press CATALOG and 
cursor to DiagnosticOn. Then press 
ENTER | twice. 


SECTION 9.1. Correlation 497 
TRY IT YOURSELF 4 


Calculate the correlation coefficient for the number of years out of school and 
annual contribution data in Try It Yourself 1. Interpret the result in the context 
of the data. 


Answer: Page A38 


Using Technology to Calculate a Correlation Coefficient 


Use technology to calculate the correlation coefficient for the Old Faithful 
data in Example 3. Interpret the result in the context of the data. 


SOLUTION 


Minitab, Excel, the TI-84 Plus, and StatCrunch each have features that allow 
you to calculate a correlation coefficient for paired data sets. Try using this 
technology to find r. You should obtain results similar to the displays shown. 


MINITAB 


Correlations: Duration, Time 


Pearson correlation of Duration and Time =(0.979 )<— Correlation coefficient 


: A B Cc 
26/ CORREL(A1:A25,B1:B25) 


27| 10.978659213) ~<— Correlation coefficient 
TI-84 PLUS 
LinReg 
y=ax+b 


a=12.48094391 
b=33.68290034 
f= ova sa 
x————— Correlation coefficient 


STATCRUNCH 


Correlation between Duration and Time is: 


0.97865921)~< 


Correlation coefficient 


Rounded to three decimal places, the correlation coefficient is 


r = 0.979. 


Round to three decimal places. 


This value of r suggests a strong positive linear correlation. 


Interpretation As the duration of the eruptions increases, the time until the 
next eruption tends to increase. 


TRY IT YOURSELF 5 


Use technology to calculate the correlation coefficient for the data on page 491 
on the salaries and average attendances at home games for the teams in Major 
League Baseball. Interpret the result in the context of the data. 

Answer: Page A38 


498 


Study Tip 


The level of significance 
is denoted by a, the 
lowercase Greek letter alpha. 


Study Tip 


The symbol |r| represents 
the absolute value of r. 
Recall that the absolute 
value of a number is its 
value, disregarding its sign. 
For example, |3| = 3 and 
|-7| = 7. 


Study Tip 


If you determine that 

the linear correlation is 
significant, then you will 

be able to proceed to write 
the equation for the line 
that best describes the 
data. This line, called the regression 
line, can be used to predict the value 
of y when given a value of x. You will 
learn how to write this equation in 
the next section. 


CHAPTER 9 Correlation and Regression 


Using a Table to Test a Population Correlation 
Coefficient p 


Once you have calculated r, the sample correlation coefficient, you will want 
to determine whether there is enough evidence to decide that the population 
correlation coefficient p is significant. In other words, based on a few pairs of 
data, can you make an inference about the population of all such data pairs? 
Remember that you are using sample data to make a decision about population 
data, so it is always possible that your inference may be wrong. In correlation 
studies, the small percentage of times when you decide that the correlation is 
significant when it is really not is called the level of significance. It is typically 
set at a = 0.01 or 0.05. When a = 0.05, you will probably decide that the 
population correlation coefficient is significant when it is really not 5% of the 
time. (Of course, 95% of the time, you will correctly determine that a correlation 
coefficient is significant.) When a = 0.01, you will make this type of error only 
1% of the time. When using a lower level of significance, however, you may fail 
to identify some significant correlations. 

In order for a correlation coefficient to be significant, its absolute value must 
be close to 1. To determine whether the population correlation coefficient p is 
significant, use the critical values given in Table 11 in Appendix B. A portion 
of the table is shown below. If |r| is greater than the critical value, then there is 
enough evidence to decide that the correlation is significant. Otherwise, there 
is not enough evidence to say that the correlation is significant. For instance, 
to determine whether p is significant for five pairs of data (n = 5) at a level of 
significance of a = 0.01, you need to compare |r| with a critical value of 0.959, 


as shown in the table. 
Number nv of pairs Critical values for 
of data in sample cK «a = 0.05 and a= 0.01 
n 


6 


0.917 


0.811 


If |r| > 0.959, then the correlation is significant. Otherwise, there is not enough 
evidence to conclude that the correlation is significant. Here are the guidelines 
for this process. 


GUIDELINES 


Using Table 11 for the Correlation Coefficient p 
In Words In Symbols 


. Determine the number of pairs 
of data in the sample. 


Determine n. 


Identify a. 
Use Table 11 in Appendix B. 


. Specify the level of significance. 
. Find the critical value. 


If |r| is greater than the critical 
value, then the correlation is 
significant. Otherwise, there is 
not enough evidence to conclude 
that the correlation is significant. 


. Decide whether the correlation 
is significant. 


. Interpret the decision in the 
context of the original claim. 


SECTION 9.1. Correlation 499 


To use Table 11 to test a correlation coefficient, note that the requirements 
for calculating a correlation coefficient given on page 496 also apply to the test. In 
this text, unless stated otherwise, you can assume that these requirements are met. 


Using Table 11 for a Correlation Coefficient 


In Example 5, you used 25 pairs of data to find r ~ 0.979. Is the correlation 
coefficient significant? Use a = 0.05. 


SOLUTION 


The number of pairs of data is 25, so n = 25. The level of significance is 
a = 0.05. Using Table 11, find the critical value in the a = 0.05 column that 
corresponds to the row with m = 25. The number in that column and row 
is 0.396. 


Critical values 


for a@ = 0.05 
{ 
n a=0.01 
4 0.990 
5 0.959 
6 0.917 
7 0.875 
8 0.834 
9 0.798 
10 0.765 
11 0.735 
12 0.708 
13 0.684 
14 0.661 
15 0.641 
16 0.623 
7 0.606 
18 0.590 
19 0.575 
20 0.561 
21 0.549 
22 0.537 
23 0.526 
24 0.515 
n=25— 0.505 
26 0.388 0.496 
27 0.381 0.487 
28 0.374 0.479 
29 0.367 0.471 


Because |r| ~ 0.979 > 0.396, you can decide that the population correlation 
is significant. 

Interpretation There is enough evidence at the 5% level of significance to 
conclude that there is a significant linear correlation between the duration of 
Old Faithful’s eruptions and the time between eruptions. 


TRY IT YOURSELF 6 


In Try It Yourself 4, you calculated the correlation coefficient of the number 
of years out of school and annual contribution data to be r ~ —0.908. Is the 
correlation coefficient significant? Use a = 0.01. 

Answer: Page A38 


In Table 11, notice that for fewer data pairs (smaller values of 1), stronger 
evidence is needed to conclude that the correlation coefficient is significant. 


500 CHAPTER 9 Correlation and Regression 


Hypothesis Testing for a Population Correlation 
Coefficient p 

You can also use a hypothesis test to determine whether the sample correlation 
coefficient r provides enough evidence to conclude that the population correlation 


coefficient p is significant. A hypothesis test for p can be one-tailed or two-tailed. 
The null and alternative hypotheses for these tests are listed below. 


Ho: p = 0 (no significant negative correlation 
{ OP ( g g ) Left-tailed test 


H,: p < 0 (significant negative correlation) 


Hp: p = 0 (no significant positive correlation) . ; 
ane ~ . Right-tailed test 
H,: p > 0 (significant positive correlation) 


Ho: p = 0 (no significant correlation 
{ ae ( g ) Two-tailed test 


H,: p ~ 0 (significant correlation) 


In this text, you will consider only two-tailed hypothesis tests for p. 


The t-Test for the Correlation Coefficient 


A t-test can be used to test whether the correlation between two variables is 
significant. The test statistic is r and the standardized test statistic 


r 
t => — DSO 
Or [1 — r? 
n—-2 
follows a f-distribution with n — 2 degrees of freedom, where n is the number 


of pairs of data. (Note that there are n — 2 degrees of freedom because one 
degree of freedom is lost for each variable.) 


GUIDELINES 


Using the ¢-Test for the Correlation Coefficient p 
In Words In Symbols 
. Identify the null and alternative hypotheses. State Hp and H,. 
. Specify the level of significance. Identify a. 
. Identify the degrees of freedom. df.=n-—-2 
. Determine the critical value(s) and Use Table 5 in 
the rejection region(s). Appendix B. 
i 


. Find the standardized test statistic. i= 


1 —r? 
n—2 


. Make a decision to reject or fail to reject If fis in the rejection 
the null hypothesis. region, then reject Hp. 
Otherwise, fail to 
reject Hp. 
. Interpret the decision in the context of 
the original claim. 


To use the f-test for a correlation coefficient, note that the requirements 
for calculating a correlation coefficient given on page 496 also apply to the test. 
In this text, unless stated otherwise, you can assume that these requirements 
are met. 


TI-84 PLUS 


W=ath 
B28 


“ 
and oh 

E -4 
df= 
-a=56,83576519 


FE a 


Study Tip 


Be sure you see in 
Example 7 that 
rejecting the null hypothesis means 
that there is enough evidence that 
the correlation is significant. 


SECTION 9.1. Correlation 501 


The ft-Test for a Correlation Coefficient 

In Example 4, you used 10 pairs of data to findr ~ 0.874. Test the significance 
of this correlation coefficient. Use a = 0.05. 

SOLUTION 


The null and alternative hypotheses are 


Ho:p =O(nocorrelation) and 4H,:p # 0 (significant correlation). 


Because there are 10 pairs of data in the sample, there are 10 — 2 = 8 degrees 
of freedom. Because the test is a two-tailed test, a = 0.05, and df. = 8, the 
critical values are —ty = —2.306 and ft) = 2.306. The rejection regions are 
t < —2.306 and t > 2.306. Using the t-test, the standardized test statistic is 


———————— Use the ¢-test for p. 
1-Pr 
n—-2 
0.874 , 
=F Substitute 0.874 for r and 10 for n. 
1 — (0.874) 
10 — 2 
= 5.087. Round to three decimal places. 


You can check this result using technology. For instance, using a TI-84 Plus, 
you can find the standardized test statistic, as shown at the left. (Note that the 
result differs slightly due to rounding.) The figure below shows the location of 
the rejection regions and the standardized test statistic. 


| 
-3 2 -l oO 1 2\3 4 5 6 


~ ty =—2.306 fy = 2.306 


Because ¢ is in the rejection region, you reject the null hypothesis. 


Interpretation There is enough evidence at the 5% level of significance to 
conclude that there is a significant linear correlation between gross domestic 
products and carbon dioxide emissions. 


TRY IT YOURSELF 7 


In Try It Yourself 5, you calculated the correlation coefficient of the salaries 
and average attendances at home games for the teams in Major League 
Baseball to be r ~ 0.775. Test the significance of this correlation coefficient. 
Use a = 0.01. 

Answer: Page A38 


In Example 7, you can use Table 11 in Appendix B to test the population 
correlation coefficient p. Given n = 10 and a = 0.05, the critical value from 
Table 11 is 0.632. Because |r| ~ 0.874 > 0.632, the correlation is significant. 
Note that this is the same result you obtained using a f-test for the population 
correlation coefficient p. 


502 CHAPTER 9 Correlation and Regression 


= 
cx) Picturing 
the World 


The scatter plot shows the results 
of a survey conducted as a group 
project by students in a high 
school statistics class in the San 
Francisco area. In the survey, 
125 high school students were 
asked their grade point average 
(GPA) and the number of caffeine 
drinks they consumed each day. 

y 


A 
45+ 


4.0-+ 
3.54- 
3.04- 
2.547 
2:0 =—* 
15-- 


1L0-- ° ° 


GPA (5-point scale) 


0.5 -- 


t+} ttt 
02 4 6 8 10 12 14 
Caffeine drinks 
(cups per day) 


What type of correlation, if 

any, does the scatter plot show 
between caffeine consumption 
and GPA? 


Correlation and Causation 


The fact that two variables are strongly correlated does not in itself imply a 
cause-and-effect relationship between the variables. More in-depth study is 
usually needed to determine whether there is a causal relationship between the 
variables. 

When there is a significant correlation between two variables, a researcher 
should consider these possibilities. 


1. Is there a direct cause-and-effect relationship between the variables? 


That is, does x cause y? For instance, consider the relationship between 
gross domestic products and carbon dioxide emissions that has been 
discussed throughout this section. It is reasonable to conclude that 

an increase in a country’s gross domestic product will result in higher 
carbon dioxide emissions. 


2. Is there a reverse cause-and-effect relationship between the variables? 


That is, does y cause x? For instance, consider the Old Faithful data 
that have been discussed throughout this section. These variables 

have a positive linear correlation, and it is possible to conclude that 
the duration of an eruption affects the time before the next eruption. 
However, it is also possible that the time between eruptions affects the 
duration of the next eruption. 


3. Is it possible that the relationship between the variables can be 
caused by a third variable or perhaps a combination of several other 
variables? 


For instance, consider the salaries and average attendances per home 
game for the teams in Major League Baseball listed on page 491. 
Although these variables have a positive linear correlation, it is 
doubtful that just because a team’s salary decreases, the average 
attendance per home game will also decrease. The relationship is 
probably due to other variables, such as the economy, the players on 
the team, and whether or not the team is winning games. Variables that 
have an effect on the variables being studied but are not included in 
the study are called lurking variables. 


4. Is it possible that the relationship between two variables may be a 
coincidence? 


For instance, although it may be possible to find a significant 
correlation between the number of animal species living in certain 
regions and the number of people who own more than two cars in 
those regions, it is highly unlikely that the variables are directly related. 
The relationship is probably due to coincidence. 


Determining which of the cases above is valid for a data set can be 
difficult. For instance, consider this example. A person breaks out in a rash after 
eating shrimp at a certain restaurant. This happens every time the person eats 
shrimp at the restaurant. The natural conclusion is that the person is allergic to 
shrimp. However, upon further study by an allergist, it is found that the person 
is not allergic to shrimp, but to a type of seasoning the chef is putting into 
the shrimp. 


SECTION 9.1 Correlation 503 


q | [ X E A hk | NN [ iN For Extra Help: MyLab Statistics 


Building Basic Skills and Vocabulary 


1. Two variables have a positive linear correlation. Does the dependent 
variable increase or decrease as the independent variable increases? What if 
the variables have a negative linear correlation? 


2. Describe the range of values for the correlation coefficient. 


3. What does the sample correlation coefficient r measure? Which value 
indicates a stronger correlation: r = 0.918 or r = —0.932? Explain your 
reasoning. 


4. Give examples of two variables that have perfect positive linear correlation 
and two variables that have perfect negative linear correlation. 


5. Explain how to determine whether a sample correlation coefficient indicates 
that the population correlation coefficient is significant. 


6. Discuss the difference between r and p. 


7. What are the null and alternate hypotheses for a two-tailed t-test for 
the population correlation coefficient p? When do you reject the null 
hypothesis? 


8. In your own words, what does it mean to say “correlation does not 
imply causation”? List a pair of variables that have correlation but no 
cause-and-effect relationship. 


Graphical Analysis Jn Exercises 9-12, determine whether there is a perfect 
positive linear correlation, a strong positive linear correlation, a perfect negative 
linear correlation, a strong negative linear correlation, or no linear correlation 
between the variables. 


7 10. > ; 
te = 
.. Mie £. 
© 2" o%8 0% 
* S0 e 
oe VS e egee 
fe, * > xX 
1. ? ; ; 2. ? 
%. © = oz 
ce e 
oe a: oo . ” 
e we e 
“ete S° eo .* 
° °? 2 > xX " > xX 


In Exercises 13 and 14, identify the explanatory variable and the response variable. 


13. A nutritionist wants to determine whether the amounts of water consumed 
each day by persons of the same weight and on the same diet can be used to 
predict individual weight loss. 


14. An actuary at an insurance company wants to determine whether the 
number of hours of safety driving classes can be used to predict the number 
of driving accidents for each driver. 


504 


CHAPTER 9 Correlation and Regression 


Graphical Analysis In Exercises 15-18, the scatter plots show the results of 
a survey of 20 randomly selected males ages 24-35. Using age as the explanatory 
variable, match each graph with the appropriate description. Explain your reasoning. 


(a) Age and body temperature (b) Age and balance on student loans 
(c) Age and income (d) Age and height 
5% , » 16. y 
B= A A 
5 100+ 80-- = 
oa Oe. w eo 92 
80 -- = : °° = 75 oo ee 
3 6+ &6 3 70-- oe 
By bd at is} oe oe e 
2 40--—e-e = 657 
S 20+$.° 60 
=| x 
Bo Wet y+ y+ te x ye ey 
26 30 34 26 30 34 
Age Age 
7. =, (OY 18. ’ 
= A A 
5 sot 110 ++ 
e 
3 40+ 2 100-+-gegsesee ess 
oo 5 90+ 
s e e f= 
4 20° * e = got 
& jt cee * +4 
e ee 
s ee ‘ af i x 
26 30 34 26 30 634 
Age Age 


In Exercises 19-22, two variables are given that have been shown to have 
correlation but no cause-and-effect relationship. Describe at least one possible 
reason for the correlation. 


19. Value of home and life span 20. Alcohol use and tobacco use 
21. Ice cream sales and homicide rates 


22. Marriage rate in Kentucky and number of deaths caused by falling out of a 
fishing boat 


Using and Interpreting Concepts 


Constructing a Scatter Plot and Determining Correlation Jn 
Exercises 23-28, (a) display the data in a scatter plot, (b) calculate the sample 
correlation coefficient r, (c) describe the type of correlation, if any, and interpret 
the correlation in the context of the data, and (d) use Table 11 in Appendix B to 
make a conclusion about the correlation coefficient. If convenient, use technology. 
Let a = 0.01. 


eB 23. Age and Vocabulary The ages (in years) of 11 children and the 
numbers of words in their vocabulary 


Age, x 1} 2 3 4 > 6 3 5 2 4 6 


Nea 3 220 540 1100 2100 2600 730 2200 260 1200. 2500 


24. Weight and Waist The weight (in kilograms) of 8 males and the 
circumference of their waists (in centimeters) 


Weight, x 96.9 | 73.1 83.1 86.5 64.1 1188 71.3 | 122.7 
Waist, y 107.8 | 97.2 95.1 | 112.0 78.0 | 112.0 95.0 | 118.0 


SECTION 9.1. Correlation 505 


Be 25. Maximal Strength and Jump Height The maximum weights (in 
kilograms) for which one repetition of a half squat can be performed 
and the jump heights (in centimeters) for 12 international soccer 
players (Adapted from British Journal of Sports Medicine) 

Maximum weight,x 190 185 | 155 | 180 175 170 


Jump height, y 60 57 54 60 56 64 


Maximum weight,x 150 160 160 180 190 210 
Jump height, y 52 51 49 57 59 64 
lad} 26. Maximal Strength and Sprint Performance The maximum weights (in 
kilograms) for which one repetition of a half squat can be performed 
and the times (in seconds) to run a 10-meter sprint for 12 international 
soccer players (Adapted from British Journal of Sports Medicine) 
Maximum weight,x 175 | 180 | 155 | 210 150. 190 
Time, y 1.80 1.77. 2.05 142 2.04 | 1.61 


Maximum weight, x 185 160 190 180 160 |) 170 
Time, y 1.70 1.91 1.60 1.63 1.98 1.90 
27. Earnings and Dividends The earnings per share (in dollars) and the 


dividends per share (in dollars) for 6 companies in a recent year (Source: The 
Value Line Investment Survey) 


Earnings per share, x 1.22 400 3.53 821 1.74 3.14 
Dividends per share, y 0.90 | 0.31 | 2.10) 1.00 | 0.55 | 1.48 


BG 28. Speed of Sound Eleven altitudes (in thousands of feet) and the 
speeds of sound (in feet per second) at these altitudes 


Altitude, x 0 5 10 15 20 25 
Speed of sound, y 1116.3 1096.9 | 1077.3. 1057.2 | 1036.8 | 1015.8 


Altitude, x 30 35 40 45 50 
Speed of sound, y 994.5 969.0 967.7 | 967.7 | 967.7 


29. In Exercise 23, remove data for a child who is 1 year old and has a vocabulary 
size of 3 words from the data set. Describe how this affects the correlation 
coefficient r. 


30. In Exercise 24, add data for a male with a weight of 105.6 kilograms and a 
waist size of 98.3 centimeters to the data set. Describe how this affects the 
correlation coefficient r. 


31. In Exercise 25, add data for an international soccer player with a maximum 
weight of 180 kilograms and a jump height of 50 centimeters to the data set. 
Describe how this affects the correlation coefficient r. 


32. In Exercise 26, remove the data for the international soccer player who can 
perform the half squat with a maximum of 210 kilograms and can sprint 
10 meters in 1.42 seconds from the data set. Describe how this affects the 
correlation coefficient r. 


506 


CHAPTER 9 Correlation and Regression 


The ¢t-Test for Correlation Coefficients Jn Exercises 33-36, perform 
a hypothesis test using Table 5 in Appendix B to make a conclusion about the 
correlation coefficient. 


33. 


34. 


35. 


36. 


Braking Distances: Dry Surface The weights (in pounds) of eight vehicles 
and the variabilities of their braking distances (in feet) when stopping on a 
dry surface are shown in the table. At a = 0.01, is there enough evidence to 
conclude that there is a significant linear correlation between vehicle weight 
and variability in braking distance on a dry surface? (Adapted from National 
Highway Traffic Safety Administration) 


Weight, x 5940 5340 | 6500 | 5100 | 5850 | 4800 5600 | 5890 
Variability, y 1.78 1.93 | 191 | 159 1.66 | 1.50 | 1.61 | 1.70 


Braking Distances: Wet Surface The weights (in pounds) of eight vehicles 
and the variabilities of their braking distances (in feet) when stopping on a 
wet surface are shown in the table. At a = 0.05, is there enough evidence to 
conclude that there is a significant linear correlation between vehicle weight 
and variability in braking distance on a wet surface? (Adapted from National 
Highway Traffic Safety Administration) 


Weight, x 5890 5340 | 6500 | 4800 | 5940 5600 5100 | 5850 
Variability, y 2.92 2.40 | 4.09 | 1.72 2.88 | 2.53 | 2.32 | 2.78 


Maximal Strength and Jump Height The table in Exercise 25 shows the 
maximum weights (in kilograms) for which one repetition of a half squat 
can be performed and the jump heights (in centimeters) for 12 international 
soccer players. At a = 0.05, is there enough evidence to conclude that there 
is a significant linear correlation between the data? (Use the value of r found 
in Exercise 25.) 


Maximal Strength and Sprint Performance The table in Exercise 26 shows 
the maximum weights (in kilograms) for which one repetition of a half squat 
can be performed and the times (in seconds) to run a 10-meter sprint for 
12 international soccer players. At a = 0.01, is there enough evidence to 
conclude that there is a significant linear correlation between the data? (Use 
the value of r found in Exercise 26.) 


Extending Concepts 


37. 


38. 


Interchanging x and y_ In Exercise 26, let the time (in seconds) to sprint 
10 meters represent the x-values and the maximum weight (in kilograms) 
for which one repetition of a half squat can be performed represent the 
y-values. Calculate the correlation coefficient r. What effect does switching 
the explanatory and response variables have on the correlation coefficient? 


Writing Use your school’s library, the Internet, or some other reference 
source to find a real-life data set with the indicated cause-and-effect 
relationship. Write a paragraph describing each variable and explain why 
you think the variables have the indicated cause-and-effect relationship. 


(a) Direct Cause-and-Effect: Changes in one variable cause changes in the 
other variable. 

(b) Other Factors: The relationship between the variables is caused by a 
third variable. 

(c) Coincidence: The relationship between the variables is a coincidence. 


Correlation by Eye 


ACTIVITY 


=> The correlation by eye applet allows you to guess the sample correlation 
coefficient r for a data set. When the applet loads, a data set consisting 


APPLET of 20 points is displayed. Points can be added to the plot by clicking the 
You can find the interactive mouse. Points on the plot can be removed by clicking on the point and then 
applet for this activity dragging the point 
within MyLab Statistics or at into the trash can. 
www.pearsonglobaleditions All of the points a 
.com. ° 


on the plot can 
be removed by 
simply clicking 
inside the trash ° 
can. Youcan enter ay oe 
your guess for r % 

in the “Guess” ° 
field, and then 45 e 
click SHOW R! . 
to see whether 
you are within 0.1 . 
of the true value. 
When you click 35 ; ; 

NEW DATA, a 35 40 45 50 55 60 

new data set is 
generated. 


ee 
55 bi 


40 


Guess: 


True r: 


New data| Show r! | 


EXPLORE 


Step 1 Add five points to the plot. 

Step 2 Enter a guess for r. 

Step 3 Click SHOW R!. 

Step 4 Click NEW DATA. 

Step 5 Remove five points from the plot. 
Step 6 Enter a guess for r. 

Step 7 Click SHOW R!. 


DRAW CONCLUSIONS 


1. Generate a new data set. Using your knowledge of correlation, try to guess 
the value of r for the data set. Repeat this 10 times. How many times were 
you correct? Describe how you chose each r value. 


N 


APPLET 


2. Describe how to create a data set with a value of r that is approximately 1. 
3. Describe how to create a data set with a value of r that is approximately 0. 


4. Try to create a data set with a value of r that is approximately —0.9. Then 
try to create a data set with a value of r that is approximately 0.9. What did 
you do differently to create the two data sets? 


SECTION 9.1. Correlation 507 


508 CHAPTER 9 Correlation and Regression 


9.2 


What You Should Learn Regression Lines m= Applications of Regression Lines 
» How to find the equation of a : r 
regression line Regression Lines 
~ How to predict y-values using a After verifying that the linear correlation between two variables is significant, 


regression equation the next step is to determine the equation of the line that best models the data. 


This line is called a regression line, and its equation can be used to predict the 
value of y for a given value of x. Although many lines can be drawn through a 
set of points, a regression line is determined by specific criteria. 

Consider the scatter plot and the line shown below. For each data point, d; 
represents the difference between the observed y-value and the predicted y-value 
for a given x-value. These differences are called residuals and can be positive, 
negative, or zero. When the point is above the line, d; is positive. When the point 
is below the line, d; is negative. When the observed y-value equals the predicted 
y-value, d; = 0. Of all possible lines that can be drawn through a set of points, 
the regression line is the line for which the sum of the squares of all the residuals 


Yd? Sum of the squares of the residuals 


Observed as? 
y-value ‘ a ie 
| 
“NS Predicted ¢ 
ae y-value 


1 " i 
ao For a given x-value, 
d = (observed y-value) — (predicted y-value) 


is a minimum. 


y 
A 


> xX 


DEFINITION 


A regression line, also called a line of best fit, is the line for which the sum of 


the squares of the residuals is a minimum. 


In algebra, you learned that you can write y 
an equation of a line by finding its slope m and 
y-intercept b. The equation has the form 


* Study Tip 
When determining the 
equation of a regression 
line, it is helpful to construct 
a scatter plot of the data to 
check for outliers, which 
can greatly influence a 

regression line. You should also check 

for gaps and clusters in the data. 


y=mx +b. 


Recall that the slope of a line is the ratio of its 
rise over its run and the y-intercept is the y-value 
of the point at which the line crosses the y-axis. It is 
the y-value when x = 0. For instance, the graph of 
y = 2x + 1is shown in the figure at the right. The 
slope of the line is 2 and the y-intercept is 1. 

In algebra, you used two points to determine the equation of a line. In 
statistics, you will use every point in the data set to determine the equation of 
the regression line. 


> xX 


| 
T 
346 


Tech Tip 


Although formulas 
for the slope and 
y-intercept are given, 
it is more convenient 
to use technology to 
calculate the equation 
of a regression line (see Example 2). 


GDP CO, emissions 


(in millions of 
metric tons), y 


(in trillions of 
dollars), x 


1.8 604.4 
1.3 434.2 
2.4 544.0 
1.5 370.4 
3.9 742.3 
2.1 340.5 
0.9 232.0 
1.4 262.3 
3.0 441.9 
4.6 1157.7 


Study Tip 


When writing the 
equation of a regression 
line, the slope m and the 
y-intercept b are rounded 
to three decimal places, 
= “as shown in Example 1. 
This rouna-off rule will be used 
throughout the text. 


SECTION 9.2 Linear Regression 509 


The equation of a regression line allows you to use the independent 


(explanatory) variable x to make predictions for the dependent (response) 
variable y. 


The Equation of a Regression Line 


The equation of a regression line for an independent variable x and a 
dependent variable y is 


y=mxt+b 


where # is the predicted y-value for a given x-value. The slope m and 
y-intercept b are given by 

nay — (x) (Sy) ae ee ey 

m= _ —- mx =—_—-mM 
nx = (Sx)" 4 n 

where y is the mean of the y-values in the data set, x is the mean of the 
x-values, and n is the number of pairs of data. The regression line always 
passes through the point (x, y). 


Finding the Equation of a Regression Line 


Find the equation of the regression line for the gross domestic products and 
carbon dioxide emissions data used in Section 9.1. (See table at the left.) 


SOLUTION 


Recall from Example 7 of Section 9.1 that there is a significant linear correlation 
between gross domestic products and carbon dioxide emissions. Also, in 
Example 4 of Section 9.1, you found that n = 10, x = 22.9, Ly = 5129.7, 
Yxy = 14,350.74, and Sx? = 65.49. You can use these values to calculate the 
slope m of the regression line 


2 n>xy — (x) (Ly) 
n>x? — (2x)? 


and its y-intercept b. 


10(14,350.74) — (22.9) (5129.7) 
= 5 =~ 199.534600 
10(65.49) — (22.9) 


b=y- mx 
5129.7 22.9 
= ——— — (199.534600)| —— 
10 (199.53460 \( =~) 

= 56.036 


So, the equation of the regression line is 
§ = 199.535x + 56.036. 


To sketch the regression line, 4 
first choose two x-values between 
the least and greatest x-values 
in the data set. Next, calculate 
the corresponding  y-values 
using the regression equation. 
Then draw a line through the 
two points. The regression line 
and scatter plot of the data are 1 2 3 4 5 
shown at the right. Notice that GDP (in trillions of dollars) 
the line passes through the point 

(x,y) = (2.29, 512.97). 


CO) emissions 
(in millions of metric tons) 


510 


Duration, Time, 
x ey 
1.80 56 
1.82 58 
1.90 62 
1.93 56 
1.98 57 
2.05 57 
2.13 60 
2.30 57 
2.37 61 
2.82 73 
3.13 76 
3.27 77 
3.65 77 


CHAPTER 9 Correlation and Regression 


To explore this topic further, 
see Activity 9.2 on page 518. 


Duration, 


x 
3.78 
3.83 
3.88 
4.10 
4.27 
4.30 
4.43 
4.47 
4.53 
4.55 
4.60 
4.63 


Time, 


y 
79 


TRY IT YOURSELF 1 


Find the equation of the regression line for the number of years out of school 
and annual contribution data used in Try It Yourself 4 in Section 9.1. 
Answer: Page A38 


Using Technology to Find a Regression Equation 


Use technology to find the equation of the regression line for the Old Faithful 
data used in Section 9.1. (See table at the left.) 


SOLUTION 


Recall from Example 6 of Section 9.1 that there is a significant linear 
correlation between the duration of Old Faithful’s eruptions and the time 
between eruptions. Minitab, Excel, and the TI-84 Plus each have features that 
calculate a regression equation. Try using this technology to find the regression 
equation. You should obtain results similar to the displays shown below. 


MINITAB 


Regression Analysis: Time versus Duration 


Coefficients 

Term Coef SE Coef T-Value P-Value 
Constant 33.68 1.89 17.79 0.000 
Duration 12.481 0.546 22.84 0.000 


Regression Equation 


Time = 33.68 + 12.481 Duration 


TI-84 PLUS 


D 


| y=ax+b 

(12.48094  4=12.48094391 
b=33.68290034 
r°=.9577738551 
33.6829 r=.9786592129 


From the displays, you can see that the regression 
equation is 


9 = 12.481x + 33.683. 


The TI-84 Plus display at the right shows the 
regression line and a scatter plot of the data in 
the same viewing window. To do this, use the 
Stat Plot feature to construct the scatter plot and 
enter the regression equation as yj. 


TI-84 PLUS 


TRY IT YOURSELF 2 


Use technology to find the equation of the regression line for the salaries and 
average attendances at home games for the teams in Major League Baseball 
listed on page 491. 

Answer: Page A38 


ON 
Bers 
RKO 


ee) Pieturing 
the World 


The scatter plot shows the 
relationship between the number 
of farms (in thousands) in a state 
and the total value of the farms 
(in billions of dollars). (Source: 
U.S. Department of Agriculture, National 
Agriculture Statistics Service) 


250 + r~ 0.810 


Total value 
(in billions of dollars) 
iB 
t 


a a 
——t 


T 
50 100 150 200 250 
Farms (in thousands) 


Describe the correlation between 
these two variables in words. 

Use the scatter plot to predict 

the total value of farms in a 

state that has 150,000 farms. The 
regression line for this scatter plot 
is y = 1.014x + 2.611. Use this 
equation to predict the total value 
in a state that has 150,000 farms 
(x = 150). (Assume x and y have 
a significant linear correlation.) 


How does your algebraic prediction 


compare with your graphical one? 


SECTION 9.2 Linear Regression 511 


Applications of Regression Lines 


When the correlation between x and y is significant (see Section 9.1), the 
equation of a regression line can be used to predict y-values for certain x-values. 
Prediction values are meaningful only for x-values in (or close to) the range of the 
observed x-values in the data. For instance, in Example 1 the observed x-values 
in the data range from $0.9 trillion to $4.6 trillion. So, it would not be appropriate 
to use the regression equation found in Example 1 to predict carbon dioxide 
emissions for gross domestic products such as $0.2 trillion or $14.5 trillion. 

To predict y-values, substitute an x-value into the regression equation, then 
calculate f, the predicted y-value. This process is shown in the next example. 


Predicting y-Values Using Regression Equations 
The regression equation for the gross domestic products (in trillions of dollars) 
and carbon dioxide emissions (in millions of metric tons) data is 
Y = 199.535x + 56.036. See Example 1. 
Use this equation to predict the expected carbon dioxide emissions for each 
gross domestic product. 
1. $1.2 trillion 
2. $2.0 trillion 
3. $2.6 trillion 


SOLUTION 


Recall from Section 9.1, Example 7, that x and y have a significant linear 
correlation. So, you can use the regression equation to predict y-values. 
Note that the given gross domestic products are in the range ($0.9 trillion to 
$4.6 trillion) of the observed x-values. To predict the expected carbon dioxide 
emissions, substitute each gross domestic product for x in the regression 
equation. Then calculate f. 


1. § = 199.535x + 56.036 Interpretation When the gross domestic 
= 199.535(1.2) + 56.036 product is $1.2 trillion, the predicted CO, 
= 295.478 emissions are 295.478 million metric tons. 

2. ) = 199.535x + 56.036 Interpretation When the gross domestic 
= 199.535(2.0) + 56.036 product is $2.0 trillion, the predicted CO, 
= 455.106 emissions are 455.106 million metric tons. 

3. } = 199.535x + 56.036 Interpretation When the gross domestic 
= 199.535(2.6) + 56.036 product is $2.6 trillion, the predicted CO, 
= 574.827 emissions are 574.827 million metric tons. 


TRY IT YOURSELF 3 


The regression equation for the Old Faithful data is f = 12.481x + 33.683. 
Use this to predict the time until the next eruption for each eruption duration. 
(Recall from Section 9.1, Example 6, that x and y have a significant linear 
correlation.) 
1. 2 minutes 


2. 3.32 minutes 
Answer: Page A38 


When the correlation between x and y is not significant, the best predicted 
y-value is y, the mean of the y-values in the data. 


512 CHAPTER 9 Correlation and Regression 


9.2 EXERCISES 


For Extra Help: MyLab Statistics 


Building Basic Skills and Vocabulary 
1. What is a residual? Explain when a residual is positive, negative, and zero. 


2. Two variables have a positive linear correlation. Is the slope of the regression 
line for the variables positive or negative? 


3. Explain how to predict y-values using the equation of a regression line. 


4. For a set of data and a corresponding regression line, describe all values of x 
that provide meaningful predictions for y. 


5. In order to predict y-values using the equation of a regression line, what 
must be true about the correlation coefficient of the variables? 


6. Why is it not appropriate to use a regression line to predict y-values for 
x-values that are not in (or close to) the range of x-values found in the data? 


In Exercises 7-12, match the description in the left column with its symbol(s) in 
the right column. 


7. The y-value of a data point corresponding to x; a. 9; 
8. The y-value for a point on the regression line b. y; 
corresponding to x; as 

9. Slope eas 
10. y-intercept en 
11. The mean of the y-values 7 


12. The point a regression line always passes through 


Graphical Analysis Jn Exercises 13-16, match the regression equation with 


the appropriate graph. 
13. § = —1.361x + 21.952 14. § = 2.115x + 21.958 
15. § = 2.125x + 9.588 16. ¥ = —0.705x + 27.214 
a. y by 
A A 
30 ++ 30-+ 
25 = 25 ue 
20 -- ° 20-- 
15-+ S 15-+ = 
10 + e 10 ++ sd 
5-4 5+ 
5 10152095 303540 p46 8 
c y dy 
A A 
140 >- 70 -- 
120+ 60-4 
100 -- 50 + 
80+ 7° 40-4 
60 -- = 30 + 
40 + 20 -+- 
20 -- 10+ 


1 t 
T T T T T T 
10 20 30 40 50 5 10 15 20 25 30 


Square Sale 
meters, x price, y 
36 1782.5 

71 380 

100 450 

40 525 

65 750 

90 900 

85 1150 


TABLE FOR EXERCISE 18 


Electrocardiogram 


}— QT Interval — 


The QT interval is a measure of 
electrical waves of the heart. A 
lengthened OT interval can indicate 
heart health problems. 


FIGURE FOR EXERCISE 21 


SECTION 9.2 Linear Regression 513 


Using and Interpreting Concepts 


Finding the Equation of a Regression Line Jn Exercises 17-26, find the 
equation of the regression line for the data. Then construct a scatter plot of the data 
and draw the regression line. (Each pair of variables has a significant correlation.) 
Then use the regression equation to predict the value of y for each of the x-values, 
if meaningful. If the x-value is not meaningful to predict the value of y, explain why 
not. If convenient, use technology. 


17. Number of Athletes and Medals Won The number of athletes participating 
and the number of medals won in the Olympics by Japan through the last nine 
years (Source: Olympian Database) 


Number of athletes, x 84 96 48 90 719 81 58 64 | 29 


Medals won, y 13.) 41 8 | 38 5 25 1 | 37 2 
(a) x = 75 athletes (b) x = 80 athletes 
(c) x = 85 athletes (d) x = 120 athletes 


18. Square Meters and Office Sale Price The square meters and sale prices (in 
thousands of Egyptian Pounds) of seven offices in Egypt are shown in the 
table at the left. (Source: RE/MAX Egypt) 


(a) x = 45 square meters (b) x = 55 square meters 


(c) x = 95 square meters (d) x = 125 square meters 


19. Hours Studying and Test Scores The number of hours 9 students spent 
studying for a test and their scores on that test 


Hours spent studying,x O | 1 1 2 4;),6;/)/7;)7 ) 8 


Test scores, y 45 | 55 60 | 64 | 68 | 79 | 85 | 94 | 89 
(e) x = 3.5 hours (f) x = 5 hours 
(g) x = 7.5 hours (h) x = 14 hours 


lad} 20. Goals and Wins The number of goals scored and the number of 
wins for the top 10 teams in the 2016-2017 English Premier League 
season (Source: Premier League) 


Goals,x 85 86 80 78 77 | 54. 62) 41 55) 43 
Wins,y 30 | 26 23) 22) 23) 18 | 17 || 12 | 12 | 12 


(a) x = 50 goals (b) x = 70 goals 
(c) x = 75 goals (d) x = 95 goals 

7% 21. Heart Rate and QT Interval The heart rates (in beats per minute) 
and QT intervals (in milliseconds) for 13 males (The figure at the 


left shows the QT interval of a heartbeat in an electrocardiogram.) 
(Adapted from Chest) 


R 


Heart rate, x 60 75 62 68 84 97 66 
QT interval, y 403 | 363 381 367. 341 317° 401 


Heart rate, x 65 86 78 93 75 88 
QT interval, y 384 | 342 0 377 | 329 | 377-3349 


(a) x = 120 beats per minute (b) x = 67 beats per minute 


(c) x = 90 beats per minute (d) x = 83 beats per minute 


514 CHAPTER 9 Correlation and Regression 


BG 22. 


Q 


24. 


Length and Girth of Harbor Seals The lengths (in centimeters) and 
girths (in centimeters) of 12 harbor seals (Adapted from Moss Landing 
Marine Laboratories) 


Length,x 137 168 152 | 145 159 | 159 
Girth, y 106 130) 116) 106 | 125. 119 


Length,x 124 | 137. 155 | 148 147. 146 
Girth, y 103 | 104 120. 110 107 | 109 


(a) x = 140 centimeters (b) x = 172 centimeters 
(c) x = 164 centimeters (d) x = 158 centimeters 


. Hot Dogs: Caloric and Sodium Content The caloric contents and 


the sodium contents (in milligrams) of 12 brands of beef hot dogs 
(Source: Walmart) 


Calories,x 180 220 230 90 160 190 
Sodium, y 510 740 740 280 530 | 580 


Calories,x 150 110 110 160 140. 150 
Sodium, y 490 480 330 640 480 460 


(a) x = 170 calories (b) x = 100 calories 
(c) x = 260 calories (d) x = 210 calories 


Employees and Revenue The number of employees and the 2016 
revenue (in millions of dollars) of 13 hotel and gaming companies 
(Source: Value Line) 


Employees, x 1800 8300 | 45,000 | 7300 | 52,000 | 10,000 | 20,200 
Revenue, y 925. 1271 | 4429 1006 9455 1811 4519 
Employees, x 13,700 5200 | 24,600 | 19,900 | 4000 | 18,800 


Revenue, y 1452 1601 4466 2184 = 1309 =. 3034 


(a) x = 32,500 employees (b) x = 6000 employees 
(c) x = 1350 employees (d) x = 47,000 employees 


. Shoe Size and Height The shoe sizes and heights (in inches) of 14 men 


Shoe size,x 8.5 9.0 9.0 95 10.0 10.0 | 10.5 
Height, y 66.0 685 67.5 70.0 70.0 | 72.0 71.5 


Shoe size,x 105 11.0) 11.0) 11.0 12.00 120 125 
Height, y 69.5 | 71.5 | 72.0 73.0 73.5 74.0 74.0 


(a) x = size 11.5 (b) x = size 8.0 
(c) x = size 15.5 (d) x = size 10.0 


SECTION 9.2 Linear Regression 515 


BG 26. Age and Hours Slept The ages (in years) of 10 infants and the 


numbers of hours each slept in a day 
Age, x 0.1 0.2 0.4 0.7 0.6 0.9 
Hours slept,y 145 143 | 141 13.9 | 13.9 | 13.7 
Age, x 0.1 0.2 0.4 0.9 


Hours slept, y 14.3 14.2 140 13.8 


(a) x = 0.3 year (b) x = 3.9 years 
(c) x = 0.6 year (d) x = 0.8 year 


Be Registered Nurse Salaries Jn Exercises 27-30, use the table, which 


shows the years of experience of 14 registered nurses and their annual salaries 
(in thousands of dollars). (Adapted from Payscale, Inc.) 


Years of 
experience, x 


Annual salary (in 


chivemnb qidiil toe 45.2 49.9 54.7 | 59.3 | 61.4 | 62.9 66.0 


Years of 
experience, x 


Annual salary (in 


thousands of dollars), y 67.1 65.3) 68.4 | 70.6 69.5 73.9 | 71.6 


27. Correlation Using the scatter plot of the registered nurse salary data 


28. 


29. 


30. 


shown below, what type of correlation, if any, do you think the data 
have? Explain. 


Registered Nurses 


Annual salary 
(in thousands of dollars) 


ve | | | | | | 


T T T t t t t t t > x 
2 4 6 8 10 12 14 16 «18 200 220 624 © «626 
Years of experience 


Regression Line Find an equation of the regression line for the data. 
Sketch a scatter plot of the data and draw the regression line. 


Using the Regression Line An analyst used the regression line you found in 
Exercise 28 to predict the annual salary for a registered nurse with 28 years 
of experience. Is this a valid prediction? Explain your reasoning. 


Significant Correlation? A salary analyst claims that the population has a 
significant correlation for a = 0.01. Test this claim. 


516 


CHAPTER 9 Correlation and Regression 


Extending Concepts 


Interchanging x and y In Exercises 31 and 32, perform the steps below. 


(a) Find the equation of the regression line for the data, letting Row I represent 
the x-values and Row 2 the y-values. Sketch a scatter plot of the data and draw 
the regression line. 


(b) Find the equation of the regression line for the data, letting Row 2 represent 
the x-values and Row I the y-values. Sketch a scatter plot of the data and draw 
the regression line. 


(c) Describe the effect of switching the explanatory and response variables on the 
regression line. 


eB 31. 


Rowl | 0 1 2 | 3 3 5 5 5 6 | 7 
Row2 96 85 82 74) 95 68 | 76 84 58 65 


2. 
: Rowl 16 25 39 45 49 64 70 


Row2 109 122 143) 132 199 | 185 | 199 


Residual Plots A residual plot allows you to assess correlation data and 
check for possible problems with a regression model. To construct a residual plot, 
make a scatter plot of (x,y — §), where y — 9 is the residual of each y-value. If 
the resulting plot shows any type of pattern, then the regression line is not a good 
representation of the relationship between the two variables. If it does not show 
a pattern—that is, if the residuals fluctuate about 0—then the regression line is a 
good representation. Be aware that if a point on the residual plot appears to be 
outside the pattern of the other points, then it may be an outlier. 


In Exercises 33 and 34, (a) find the equation of the regression line, (b) construct a 
scatter plot of the data and draw the regression line, (c) construct a residual plot, 
and (d) determine whether there are any patterns in the residual plot and explain 
what they suggest about the relationship between the variables. 


oe x 38 34 40 46 43 48 60) 55 | 52 


y | 24 | 22 | 27 | 32 | 30 | 31 | 27 | 26 | 28 


x 8 4 $15 7 6 3/12) 10) 5 
y/ 18) 11 29) 18) 14) 8 | 25 | 20 | 12 


Influential Points An influential point is a point in a data set that can greatly 
affect the graph of a regression line. An outlier may or may not be an influential 
point. To determine whether a point is influential, find two regression lines: 
one including all the points in the data set, and the other excluding the possible 
influential point. If the slope or y-intercept of the regression line shows significant 
changes, then the point can be considered influential. An influential point can be 
removed from a data set only when there is proper justification. 


In Exercises 35 and 36, (a) construct a scatter plot of the data, (b) identify any possible 
outliers, and (c) determine whether the point is influential. Explain your reasoning. 


35. 
| 5 6) 9 10> 14) 17 19 | 44 


y | 32 | 33 | 28 | 26 | 25 | 23 | 23 | 8 


Number of | Number of 


hours, x 
1 


NDUN fF WN 


TABLE FOR EXERCISES 37-40 


OID N FB WN KH) # 
— 
Re 
= 


74 


TABLE FOR EXERCISES 41-44 


bacteria, y 
165 
280 
468 
780 
1310 
1920 
4900 


SECTION 9.2 Linear Regression 517 


Transformations to Achieve Linearity When a linear model is not 
appropriate for representing data, other models can be used. In some cases, 
the values of x or y must be transformed to find an appropriate model. In a 
logarithmic transformation, the logarithms of the variables are used instead of the 
original variables when creating a scatter plot and calculating the regression line. 


In Exercises 37-40, use the data shown in the table at the left, which shows the 
number of bacteria present after a certain number of hours. 


37. Find the equation of the regression line for the data. Then construct a scatter 
plot of (x, y) and sketch the regression line with it. 


38. Replace each y-value in the table with its logarithm, log y. Find the equation 
of the regression line for the transformed data. Then construct a scatter plot 
of (x, log y) and sketch the regression line with it. What do you notice? 


39. An exponential equation is a nonlinear regression equation of the form y = ab*. 
Use technology to find and graph the exponential equation for the original data. 
Include the original data in your graph. Note that you can also find this model 
by solving the equation log y = mx + b from Exercise 38 for y. 


40. Compare your results in Exercise 39 with the equation of the regression 
line and its graph in Exercise 37. Which equation is a better model for the 
data? Explain. 


In Exercises 41—44, use the data shown in the table at the left. 


41. Find the equation of the regression line for the data. Then construct a scatter 
plot of (x, y) and sketch the regression line with it. 


42. Replace each x-value and y-value in the table with its logarithm. Find the 
equation of the regression line for the transformed data. Then construct a 
scatter plot of (log x, log y) and sketch the regression line with it. What do 
you notice? 


43. A power equation is a nonlinear regression equation of the form y = ax?. 


Use technology to find and graph the power equation for the original data. 
Include a scatter plot in your graph. Note that you can also find this model 
by solving the equation log y = m(log x) + b from Exercise 42 for y. 


44. Compare your results in Exercise 43 with the equation of the regression 
line and its graph in Exercise 41. Which equation is a better model for the 
data? Explain. 


Logarithmic Equation The logarithmic equation is a nonlinear regression 
equation of the form y =a + blinx. In Exercises 45—48, use this information 
and technology. 


45. Find and graph the logarithmic equation for the data in Exercise 25. 
46. Find and graph the logarithmic equation for the data in Exercise 26. 


47. Compare your results in Exercise 45 with the equation of the regression line 
and its graph. Which equation is a better model for the data? Explain. 


48. Compare your results in Exercise 46 with the equation of the regression line 
and its graph. Which equation is a better model for the data? Explain. 


ACTIVITY 


Regression by Eye 


APPLET 


You can find the interactive 
applet for this activity 
within MyLab Statistics or at 
www.pearsonglobaleditions 
.com. 


N 


APPLET 


518 


The regression by eye applet allows you to interactively estimate the regression 
line for a data set. When the applet loads, a data set consisting of 20 points is 
displayed. Points on the plot can be added to the plot by clicking the mouse. 
Points on the plot can be removed by clicking on the point and then dragging the 
point into the trash can. All of the points on the plot can be removed by simply 
clicking inside the trash can. You can move the green line on the plot by clicking 
and dragging the endpoints. You should try to move the line in order to minimize 
the sum of the squares of the residuals, also known as the sum of square error 
(SSE). Note that the regression line minimizes the SSE. The SSE for the green 
line and for the regression 
line are shown below the plot. 
The equations of each line are 
shown above the plot. Click 


Green line: y = 10.017 + 0x 


Regression line: y = 1.5 + 0.83x 


SHOW REGRESSION LINE! 25 
to see the regression line in : 
the plot. Click NEW DATA to = 


generate a new data set. 


10 a 
e 
. 
| EXPLORE | . é * 


Step 1 Move the endpoints 
of the green line to try 


to approximate the 0 3 10 5 20 
regression line. Green as 472.20698 

Step 2 Click SHOW Regression SSE: 178.7345 
REGRESSION New data Show regression line! | 
LINE!. 


DRAW CONCLUSIONS 


1. Click NEW DATA to generate a new data set. Try to move the green line 
to where the regression line should be. Then click SHOW REGRESSION 
LINE!. Repeat this five times. Describe how you moved each green line. 


2. Ona blank plot, place 10 points so that they have a strong positive correlation. 
Record the equation of the regression line. Then, add a point in the upper left 
corner of the plot and record the equation of the regression line. How does 
the regression line change? 


3. Remove the point from the upper-left corner of the plot. Add 10 more points 
so that there is still a strong positive correlation. Record the equation of the 
regression line. Add a point in the upper-left corner of the plot and record the 
equation of the regression line. How does the regression line change? 


4. Use the results of Exercises 2 and 3 to describe what happens to the slope of 
the regression line when an outlier is added as the sample size increases. 


CHAPTER 9 Correlation and Regression 


In a study published in Medicine and Science in 
Sports and Exercise (volume 17, no. 2, page 211), the 
measurements of 252 men (ages 22-81) were taken. 
Of the 14 measurements taken of each man, some 
have significant correlations and others do not. For 
instance, the scatter plot at the right shows that the hip 
and abdomen circumferences of the men have a strong 
linear correlation (r ~ 0.874). The partial table shown 
here lists only the first nine rows of the data. 


Correlation of Body Measurements 


Hip and Abdomen Circumferences 
y 
A 


130 +- 


120 


110 


100 


so 
o 
i 


8 
{ 


x 
i= 


Abdomen circumference (in centimeters) 


\— t—> x 
8 90 95 100 105 110 115 


Hip circumference (in centimeters) 


Age Weight Height Neck Chest Abdom. Hip Thigh 
(yr) (Ib) (in.) (em) (cm) = (em) (cm) (em) 


Knee Ankle Bicep Forearm Wrist Body 
(cm) (cm) (cm) (cm) (cm) fat % 


22. 173.25 | 72.25 385 | 936, 83.0 | 987) 58.7 
22 | 154.00 | 66.25 34.0 958) 87.9 | 99.2) 596 
23 | 154.25 | 67.75 | 36.2 | 931 85.2 | 945) 59.0 
23 198.25 73.50 421 996) 886 104.1 63.1 
23 | 159.75 | 72.25 | 35.5 | 921| 771 | 939] 561 
23 | 188.15 | 77.50 | 380 | 966 853 | 1025) 59.1 
24 184.25 71.25 344 973 100.0 101.9 63.2 
24 | 210.25 | 74.75 | 39.0 | 1045 | 944 | 107.8) 66.0 
24 | 156.00 | 70.75 | 35.7 | 92.7] 81.9 | 953] 564 


37.33 23.4 | 30.5 28.9 18.2 6.1 
38.9 24.0 | 28.8 25.2 16.6 9 25.3 
37.3 | 21.9 | 32.0 27.4 17.1 | 12.3 
41.7 25.0 | 35.6 30.0 19.2 | 11.7 
36.1 | 22.7 | 30.5 27.2 18.2 9.4 
37.6 | 23.2 | 318 29.7 18.3 10.3 
42.2 240 | 32.2 27.7 17.7. 28.7 
42.00 25.6 | 35.7 30.6 18.8 20.9 
36.5 | 22.0 | 33.5 28.3 17.3 14.2 


Source: “Generalized Body Composition Prediction Equation for Men Using Simple Measurement Techniques” by K.W. Penrose et al. (1985). 


MEDICINE AND SCIENCE IN SPORTS AND EXERCISE, vol. 17, no.2, p. 189. 


EXERCISES 


1. Using your intuition, classify each (x,y) pair 3. 
as having a weak correlation (0 < r < 0.5), 
a moderate correlation (0.5 <r< 0.8), or a 


strong correlation (0.8 < r < 1.0). 4 
(a) (weight, neck) (b) (weight, height) 

(c) (age, body fat) (d) (chest, hip) 

(e) (age, wrist) (f) (ankle, wrist) 


(g) (forearm, height) (h) (bicep, forearm) 
(i) (weight, body fat) (j) (knee, thigh) 
(k) (hip, abdomen) (1) (abdomen, hip) 


2. Use technology to find the correlation coefficient 
for each pair in Exercise 1. Compare your 
results with those obtained by intuition. 


Use technology to find the regression line 
for each pair in Exercise 1 that has a strong 
correlation. 


. Use the results of Exercise 3 to predict the 


following. 

(a) The hip circumference of a man whose 
chest circumference is 95 centimeters 

(b) The height of a man whose forearm 
circumference is 28 centimeters 


. Are there pairs of measurements that have 


stronger correlation coefficients than 0.85? Use 
technology and intuition to reach a conclusion. 


Case Study 519 


520 


93 Measures of Regression and Prediction Intervals 


What You Should Learn 


fe 


How to interpret the three types 
of variation about a regression 
line 


» How to find and interpret the 


coefficient of determination 


~ How to find and interpret the 


standard error of estimate for a 
regression line 


~ How to construct and interpret 


a prediction interval for y 


CHAPTER 9 Correlation and Regression 


Variation about a Regression Line m= The Coefficient of Determination m= 
The Standard Error of Estimate m Prediction Intervals 


Variation About a Regression Line 


In this section, you will study two measures used in correlation and regression 
studies—the coefficient of determination and the standard error of estimate. 
You will also learn how to construct a prediction interval for y using a regression 
equation and a given value of x. Before studying these concepts, you need to 
understand the three types of variation about a regression line. 

To find the total variation, the explained variation, and the unexplained 
variation about a regression line, you must first calculate the total deviation, the 
explained deviation, and the unexplained deviation for each ordered pair (x;, y;) 
in a data set. These deviations are shown in the figure. 


A 


Total deviation = y; — y Total € . deviation 


deviation 


Explained deviation = $; — y ee, 


(x, 5) . Explained 
“°——*_ deviation 
hele y 


Unexplained deviation = y; — 9; y = 
p yi7~ ¥ (5) 


> xX 


I 
| 
! 

x 


After calculating the deviations for each data point (x;, y;), you can find the total 
variation, the explained variation, and the unexplained variation. 


DEFINITION 


The total variation about a regression line is the sum of the squares of the 
differences between the y-value of each ordered pair and the mean of y. 


Total variation = ¥(y,; — y)? 


The explained variation is the sum of the squares of the differences between 
each predicted y-value and the mean of y. 


Explained variation = ¥(9; — y)? 


The unexplained variation is the sum of the squares of the differences 
between the y-value of each ordered pair and each corresponding predicted 
y-value. 


Unexplained variation = (y,; — 9)? 


The sum of the explained and unexplained variations is equal to the total 
variation. 


Total variation = Explained variation + Unexplained variation 


As its name implies, the explained variation can be explained by the 
relationship between x and y. The unexplained variation cannot be explained by 
the relationship between x and y and is due to other factors, such as sampling 
error, coincidence, or lurking variables. (Recall from Section 9.1 that lurking 
variables are variables that have an effect on the variables being studied but are 
not included in the study.) 


LN 
KY 
MOSS 


ee) Picturing 
the World 


Janette Benson (Psychology 
Department, University of Denver) 
performed a study relating the 
age at which infants crawl (in 
weeks after birth) with the average 
monthly temperature six months 
after birth. Her results are based 
on a sample of 414 infants. 
Benson believes that the reason 
for the correlation of temperature 
and crawling age is that parents 
tend to bundle infants in more 
restrictive clothing and blankets 
during cold months. This bundling 
doesn’t allow the infant as 

much opportunity to move and 
experiment with crawling. 


Crawling age (in weeks) 


Temperature (in °F) 


The correlation coefficient is 

r = —0.701. What percent of 

the variation in the data can be 
explained? What percent is due 
to other factors, such as sampling 
error, coincidence, or lurking 
variables? 


SECTION 9.3 Measures of Regression and Prediction Intervals 521 


The Coefficient of Determination 


You already know how to calculate the correlation coefficient r. The square of 
this coefficient is called the coefficient of determination. It can be shown that 
the coefficient of determination is equal to the ratio of the explained variation 
to the total variation. 


DEFINITION 


The coefficient of determination r? is the ratio of the explained variation to 
the total variation. That is, 


, _ Explained variation 


Total variation 


It is important that you interpret the coefficient of determination correctly. 
For instance, if the correlation coefficient is r = 0.900, then the coefficient of 
determination is 


r? = (0.900)? 
= 0.810. 
This means that 81% of the variation in y can be explained by the relationship 


between x and y. The remaining 19% of the variation is unexplained and is due 
to other factors, such as sampling error, coincidence, or lurking variables. 


Finding the Coefficient of Determination 


The correlation coefficient for the gross domestic products and carbon dioxide 
emissions data is 


r =~ 0.874. 


See Example 4 in Section 9.1. 


Find the coefficient of determination. What does this tell you about the 
explained variation of the data about the regression line? about the unexplained 
variation? 


SOLUTION 
The coefficient of determination is 
r> = (0.874)? 
= 0.764. Round to three decimal places. 


Interpretation About 76.4% of the variation in the carbon dioxide 
emissions can be explained by the relationship between the gross domestic 
products and carbon dioxide emissions. About 23.6% of the variation is 
unexplained and is due to other factors, such as sampling error, coincidence, 
or lurking variables. 


TRY IT YOURSELF 1 
The correlation coefficient for the Old Faithful data is 


r = 0.979. See Example 5 in Section 9.1. 


Find the coefficient of determination. What does this tell you about the 
explained variation of the data about the regression line? about the unexplained 
variation? 

Answer: Page A38 


522 CHAPTER 9 Correlation and Regression 


The Standard Error of Estimate 


When a f-value is predicted from an x-value, the prediction is a point estimate. 
You can construct an interval estimate for f, but first you need to calculate the 
standard error of estimate. 


DEFINITION 


The standard error of estimate s, is the standard deviation of the observed 
y;-values about the predicted }-value for a given x,-value. It is given by 


é= po ~ 3)? 
° n—2 


where v is the number of pairs of data. 


From this formula, you can see that the standard error of estimate is the 
square root of the unexplained variation divided by n — 2. So, the closer the 
observed y-values are to the predicted }-values, the smaller the standard error 
of estimate will be. 


GUIDELINES 


Finding the Standard Error of Estimate s, 
In Words In Symbols 
. Make a table that includes the five column Xin Vis Vis (Vi — 9:)s 
headings shown at the right. (y; — 9)? 
. Use the regression equation to calculate 9; = mx; + b 
the predicted y-values. 


. Calculate the sum of the squares of the 
differences between each observed y-value 
and the corresponding predicted y-value. 


. Find the standard error of estimate. 


Instead of the formula used in Step 4, you can also find the standard error of 
estimate using the formula 


7 — — by y — m>xy 
So =, a 


This formula is easy to use if you have already calculated the slope m, the 
y-intercept b, and several of the sums. For instance, consider the gross domestic 
products and carbon dioxide emissions data (see Example 4 in Section 9.1 
and Example 1 in Section 9.2). To use the alternative formula, note that the 
regression equation for these data is ? = 199.535x + 56.036 and the values of 
the sums are Ly” = 3,312,080.89, Sy = 5129.7, and Yxy = 14,350.74. So, using 
the alternative formula, the standard error of estimate is 


_ — — by — m>xy 
— n-2 
7 = — (56.036) (5129.7) — (199.535) (14,350.74) 
10 -— 2 


= 141.932. 


SECTION 9.3 Measures of Regression and Prediction Intervals 523 


Finding the Standard Error of Estimate 


The regression equation for the gross domestic products and carbon dioxide 
emissions data is 


9 = 199.535x + 56.036. See Example 1 in Section 9.2. 
Find the standard error of estimate. 


SOLUTION 


Use a table to calculate the sum of the squared differences of each observed 
y-value and the corresponding predicted y-value. 


Xj Ji di yi — Ji (y; - di)” 
1.8 604.4 415.199 189.201 35,797.0184 
1.3 434.2 | 315.4315 118.7685 14,105.95659 
2.4 544.0 534.92 9.08 82.4464 
1.5 370.4 | 355.3385 15.0615 226.8487822 
3.9 742.3 | 834.2225 —91.9225 8,449.746006 
2.1 340.5 475.0595 | —134.5595 18,106.25904 
0.9 232.0 235.6175 —3.6175 13.08630625 
1.4 262.3 335.385 —73.085 5,341.417225 
3.0 441.9 654.641 —212.741 45,258.73308 
4.6 1157.7 | 973.897 183.803 33,783.54281 


L= —> 
Unexplained variation 
When n = 10 and ¥(y; — $;)* = 161,165.0546 are used, the standard error of 
estimate is 


Ey =a)" 
n—-2 


_ (161,165.0546 
a Ti =o 


141.935. 


Se = 


N 


Interpretation The standard error of estimate of the carbon dioxide emissions 
for a specific gross domestic product is about 141.935 million metric tons. 


TRY IT YOURSELF 2 


A researcher collects the data shown below and concludes that there is a 
significant relationship between the amount of radio advertising time (in 
minutes per week) and the weekly sales of a product (in hundreds of dollars). 


Radio ad time,x 15 20 20) 30 40) 45 | 50 60 
Weekly sales, y 26 32 38 56 54) 78 | 80. 88 


Find the standard error of estimate. Use the regression equation 


y = 1.405x + 7.311. 
Answer: Page A38 


524 CHAPTER 9 Correlation and Regression 


Prediction Intervals 


Recall from Section 9.1 that one of the requirements for calculating a correlation 
coefficient is that the two variables x and y have a bivariate normal distribution. 
Two variables have a bivariate normal distribution when for any fixed values of 
x the corresponding values of y are normally distributed, and for any fixed values 
of y the corresponding values of x are normally distributed. 


Bivariate Normal Distribution 


Because regression equations are determined using random samples of paired 
data and because x and y are assumed to have a bivariate normal distribution, 
you can construct a prediction interval for the true value of y. To construct the 
prediction interval, use a ¢-distribution with n — 2 degrees of freedom. 


DEFINITION 


Given a linear regression equation ? = mx + b and Xo, a specific value of x, 
a c-prediction interval for y is } — E < y < § + E where 
n(xo — x)? 


1 
E= ise sae a 7 = 
no ndx°- — (2x) 


The point estimate is § and the margin of error is E. The probability that the 
prediction interval contains y is c (the level of confidence), assuming that the 
estimation process is repeated a large number of times. 


GUIDELINES 


Constructing a Prediction Interval for y for a Specific Value of x 
In Words In Symbols 
. Identify the number n of pairs of df.=n-—-2 
data and the degrees of freedom. 


Study Tip 


The formulas for s, and 
E use the quantities 


. Use the regression equation 
and the given x-value to find 
the point estimate /. 


3; = mx; + b 


Siy- 0, Gal, 
and > x’. Use a table 
to calculate these 
quantities. 


. Find the critical value f, that 
corresponds to the given level 
of confidence c. 


Use Table 5 in Appendix B. 


. Find the standard error of 
estimate s,. 


. Find the margin of error E. 


. Find the left and right 
endpoints and form the 
prediction interval. 


T= 9)" 
i= 2 


hn 


n(xXo — x)? 


1 
B= tsi a + 


n>x? — (x)? 
Left endpoint: } — E 

Right endpoint: } + E 
Interval: } -E<y<jSt+E 


COz2 emissions 


(in millions of metric tons) 


1300 


1100 
900 
700 
500 
300 
100 

—100 


90% prediction 


1 2 3 4 2 
GDP (in trillions of dollars) 


SECTION 9.3 Measures of Regression and Prediction Intervals 525 


Constructing a Prediction Interval 

Using the results of Example 2, construct a 90% prediction interval for the 
carbon dioxide emissions when the gross domestic product is $2.8 trillion. 
What can you conclude? 

SOLUTION 


Because n = 10, there are d.f. = 10 — 2 = 8 degrees of freedom. Using the 
regression equation 


§ = 199.535x + 56.036 
and 

x = 2.8 
the point estimate is 

§ = 199.535x + 56.036 


= 199,535(2.8) + 56.036 
= 614.734. 


From Table 5, the critical value is ¢, = 1.860 and from Example 2,5, ~ 141.935. 
From Example 4 in Section 9.1, you found that {x = 22.9 and }x? = 65.49. 
Also, ¥ = 2.29. Using these values, the margin of error is 


—_ 1 n(xo — x)? 
E= ise + - + a = ae 


1 10(2.8 — 2.29)? 
(1.860) (141.935) ,/1 + — + 5 
10  10(65.49) — (22.9) 


279.382. 


u 


N 


Using 9 = 614.734 and E ~ 279.382, the prediction interval is constructed 
as shown. 


Left Endpoint Right Endpoint 
3 — E = 614.734 — 279.382 9+ E = 614.734 + 279.382 
= 335.352 = 894.116 


Wess 335.352 < y < 894.116 a 


Interpretation You can be 90% confident that when the gross domestic 
product is $2.8 trillion, the carbon dioxide emissions will be between 335.352 
and 894.116 million metric tons. 


TRY IT YOURSELF 3 


Using the results of Example 2, construct a 95% prediction interval for the 
carbon dioxide emissions when the gross domestic product is $4 trillion. What 
can you conclude? 

Answer: Page A38 


For x-values near x, the prediction interval for y becomes narrower. 
For x-values further from xX, the prediction interval for y becomes wider. (This is 
one reason why the regression equation should not be used to predict y-values 
for x-values outside the range of the observed x-values in the data.) For instance, 
consider the 90% prediction intervals for y in Example 3 shown at the left. 
The range of the x-values is 0.9 = x = 4.6. Notice how the confidence interval 
bands curve away from the regression line as x gets closer to 0.9 or to 4.6. 


526 CHAPTER 9 Correlation and Regression 


9.3 EXERCISES 


For Extra Help: MyLab Statistics 


Building Basic Skills and Vocabulary 


Graphical Analysis Jn Exercises 1-3, use the figure. 


y 
A 


(X;,¥j) e 


aft 


> X 


x 


1. Describe the total variation about a regression line in words and in symbols. 
2. Describe the explained variation about a regression line in words and in symbols. 


3. Describe the unexplained variation about a regression line in words and in symbols. 
4. The coefficient of determination r? is the ratio of which two types of 
variations? What does r? measure? What does 1 — r? measure? 


5. What is the coefficient of determination for two variables that have perfect 
positive linear correlation or perfect negative linear correlation? Interpret 
your answer. 


6. Two variables have a bivariate normal distribution. Explain what this means. 


In Exercises 7-10, use the value of the correlation coefficient r to calculate the 
coefficient of determination r’. What does this tell you about the explained 
variation of the data about the regression line? about the unexplained variation? 


7. r = 0.465 8. r = —0.328 
9, r = —0.957 10. r = 0.881 


Using and Interpreting Concepts 


Finding the Coefficient of Determination and the Standard Error 
of Estimate Jn Exercises 11-20, use the data to (a) find the coefficient of 
determination r? and interpret the result, and (b) find the standard error of 
estimate s, and interpret the result. 


BG 11. Stock Offerings The numbers of initial public offerings of stock issued 
and the total proceeds of these offerings (in millions of dollars) for 12 
years are shown in the table. The equation of the regression line is 
¥ = 104.965x + 14,093.666. (Source: University of Florida) 


Number of offerings, x 316 485 382 719 70 67 
Proceeds, y 34,314 64,906 | 64,876 34,241 22,136 10,068 
Number of offerings, + 183 168 162 162 21 43 


Proceeds, y 31,927 | 28,593 | 30,648 | 35,762 | 22,762 | 13,307 


SECTION 9.3 Measures of Regression and Prediction Intervals 527 


Be 12. Earnings of Men and Women The table shows the median annual 
earnings (in dollars) of male and female workers from 10 states in a recent 
year. The equation of the regression line is § = 1.005x — 10,770.313. 


(Source: U.S. Census Bureau) 


Median annual earnings 


50,976 46,763 46,934 41,092 | 47,960 
of male workers, x 


Median annual earnings 


40,214 | 36,834 36,841 | 31,110 40,173 
of female workers, y 


Median annual earnings 


43,829 | 47,092 51,628 46,123 61,666 
of male workers, x 


Median annual earnings 


32,096 | 35,753 41,690 | 33,443 50,802 
of female workers, y 


13. Goals Allowed and Points The table shows the number of goals 
allowed and the total points earned (2 points for a win and 1 point for 
an overtime or shootout loss) by the 14 Western Conference teams in 
the 2016-2017 National Hockey League season. The equation of the 
regression line is } = —0.573x + 220.087. (Source: ESPN) 


Q 


Goals allowed,x 213 208 | 218 | 224 + =256 | 262 | 278 


Points, y 109 106 | 99 94 87 79 48 


Goals allowed,x 200 212 201 | 221 205 260 | 243 
Points, y 105. 103 | 99 94 86 70 69 


14. Trees The table shows the heights (in feet) and trunk diameters (in inches) 
of eight trees. The equation of the regression line is } = 0.479x — 24.086. 


Height, x 70 72 715 76 85 78 77 82 


Trunk 


; 83 105 11.0 114 149 140 |) 163 15.8 
diameter, y 


eB 15. STEM Employment and Mean Wage The table shows the 
percentage of employment in STEM (science, technology, engineering, 
and math) occupations and mean annual wage (in thousands of 
dollars) for 16 industries. The equation of the regression line is 
§ = 1.153x + 46.374. (Source: U.S. Bureau of Labor Statistics) 


Percentage of 
employment in 10.5 15.8 16 11.0 84 1.1 | 23.7 | 7.0 
STEM occupations, x 


Mean annual wage, y 63.35 73.1 51.3 49.6 | 54.8 46.2 70.4 | 67.4 


Percentage of 
employment in 1.0 34 | 16.7 3.7 | 48 1.0 14° 84 
STEM occupations, x 


Mean annual wage, y 45.0 77.6 | 79.6 36.7 | 52.4 51.0 39.2 | 57.4 


528 CHAPTER 9 Correlation and Regression 


16. Voter Turnout The Australian voting age populations (in millions) and 
the number of votes cast (in millions) for the democratic elections for nine 
election years are shown in the table. The equation of the regression line is 
¥ = 0.8194x + 1.842. (Source: Institute for Democracy and Electoral Assistance) 


Voting age population,x 18.1 17.4 16.2 15.7 15.0 


Votes cast in elections, y | 16.5 16.2 15.1 14.9 14.1 


Voting age population, x | 14.3 14.0 13.5 13.0 


Votes cast in elections, y 13.6 13.3 12.9 12.4 


17. Wheat The table shows the quantity of wheat (in millions of kilograms per 
year) produced by India and the quantity of wheat (in millions of kilograms 
per year) exported by India for seven years. The equation of the regression 
line is § = 0.249x — 19023. (Source: IndexMundi) 


Produced, x 69355 75807 | 78570 | 80679 | 80804 86874 | 94882 
Exported, y 94 49 23 58 72 891 6824 


ad} 18. Fund Assets The table shows the total assets (in billions of dollars) 
of individual retirement accounts (IRAs) and federal defined benefit 
(DB) plans for ten years. The equation of the regression line is 
¥ = 0.140x + 453.959. (Source: Investment Company Institute) 


IRAs, x 4748 3681 | 4488 | 5029 | 5153 
Federal DB plans, y 978 1033 1095 1161 | 1230 


IRAs, x 5785 6819 | 7292 | 7329 | 7850 
Federal DB plans, y 1270 1370 1438) 1512 1595 


7 19. New-Vehicle Sales The table shows the numbers of new-vehicle sales 
(in thousands) in the United States for Ford and General Motors for 
11 years. The equation of the regression line is § = 1.624x — 747.304. 
(Source: NADA Industry Analysis Division) 


New-vehicle sales 


(Ford), x 3107 2848 | 2502 | 1942 | 1656 | 1905 
u 


New-vehicle sales 


(Generullictors) ay 4457 4068 | 3825 | 2956 | 2072 2211 


New-vehicle sales 


(Ford), x 2111 2206 2435 | 2418 | 2549 
e 


New-vehicle sales 


(General Mictorny iy 2504 2596 2786 | 2935 | 3082 


Keeping cars longer 
The median age of vehicles on U.S. 
roads for eight different years: 


~~ Median age in years 
Cars, x LightTrucks, y 


10.4 9.8 
10.5 10.1 
10.8 10.5 
11.1 10.8 
11.3 11.1 
11.4 11.3 
11.4 11.4 


(Source: Polk Co., IHS Automotive) 


FIGURE FOR EXERCISES 31-34 


SECTION 9.3 Measures of Regression and Prediction Intervals 529 


Be 20. New-Vehicle Sales The table shows the numbers of new-vehicle 


sales (in thousands) in the United States for Toyota and Honda for 
11 years. The equation of the regression line is ) = 0.460x + 410.839. 
(Source: NADA Industry Analysis Division) 


New-vehicle sales (Toyota), x 2260 | 2543 2621 | 2218 1770 1764 


New-vehicle sales (Honda), y 1463 | 1509 | 1552 | 1429) 1151 1231 


New-vehicle sales (Toyota), x 1645 | 2083 | 2236 | 2374 2499 


New-vehicle sales (Honda), y 1147 | 1423.) 1525 | 1541 | 1587 


Constructing and Interpreting a Prediction Interval Jn Exercises 
21-30, construct the indicated prediction interval and interpret the results. 


21. 


22. 


23. 


24. 


25. 


26. 


27. 


28. 


29. 


30. 


Proceeds Construct a 95% prediction interval for the proceeds from initial 
public offerings in Exercise 11 when the number of offerings is 450. 


Earnings of Women Construct a 95% prediction interval for the median 
annual earnings of female workers in Exercise 12 when the median annual 
earnings of male workers is $45,637. 


Points Construct a 90% prediction interval for total points earned in 
Exercise 13 when the number of goals allowed by the team is 250. 


Trees Construct a 90% prediction interval for the trunk diameter of a tree 
in Exercise 14 when the height is 80 feet. 


Mean Wage Construct a 99% prediction interval for the mean annual wage 
in Exercise 15 when the percentage of employment in STEM occupations is 
13% in the industry. 


Voter Turnout Construct a 99% prediction interval for number of votes 
cast in Exercise 16 when the voting age population is 15 million. 


Wheat Construct an 80% prediction interval for the quantity of wheat 
exported by India in Exercise 17 when the quantity of wheat produced by 
India is 99,000 million kilograms per year. 


Total Assets Construct a 90% prediction interval for the total assets in 
federal defined benefit plans in Exercise 18 when the total assets in IRAs is 
$6200 billion. 


New-Vehicle Sales Construct a 95% prediction interval for new-vehicle 
sales for General Motors in Exercise 19 when the number of new vehicles 
sold by Ford is 2628 thousand. 


New-Vehicle Sales Construct a 99% prediction interval for new-vehicle 
sales for Honda in Exercise 20 when the number of new vehicles sold by 
Toyota is 2359 thousand. 


Old Vehicles In Exercises 31-34, use the figure shown at the left. 


31. 
32. 
33. 


34. 


Scatter Plot Construct a scatter plot of the data. Show y and xX on the graph. 
Regression Line Find and draw the regression line. 


Coefficient of Determination Find the coefficient of determination r? and 
interpret the results. 


Error of Estimate Find the standard error of estimate s, and interpret the 
results. 


530 


CHAPTER 9 Correlation and Regression 


Extending Concepts 


Hypothesis Testing for Slope When testing the slope M of the regression 
line for the population, you usually test that the slope is 0, or Hy: M = 0. A slope 
of 0 indicates that there is no linear relationship between x and y. To perform the 
t-test for the slope M, use the standardized test statistic 


2 
pan 5 (<x) 
Ss; n 
with n — 2 degrees of freedom. Then, using the critical values found in Table 5 
in Appendix B, make a decision whether to reject or fail to reject the null 
hypothesis. You can also use the LinRegTTest feature on a TI-84 Plus to calculate 


the standardized test statistic as well as the corresponding P-value. If P = a, then 
reject the null hypothesis. If P > a, then do not reject Ho. 


In Exercises 35 and 36, test the claim and interpret the results in the context of the 
problem. If convenient, use technology. 


35. The table shows the weights (in pounds) and the numbers of hours slept 
in a day by a random sample of infants. Test the claim that M # 0. Use 
a = 0.01. 
Weight, x 81 | 10.2 | 9.9 7.2 69 11.2 11 15 
Hours slept,y 148 146 141 142 138 13.2 | 13.9 | 12.5 
7% 36. The table shows the ages (in years) and salaries (in thousands of 
dollars) of a random sample of engineers at a company. Test the claim 
that M # 0. Use a = 0.05. 
Age, x 25 34 29 30 42 38 49 52 35 40 
Salary, y 57.5 61.2 | 59.9 58.7 87.5. 67.4 | 89.2. 85.3 69.5 | 75.1 


Confidence Intervals for y-Intercept and Slope Yow can construct 
confidence intervals for the y-intercept B and slope M of the regression line 
y = Mx + B for the population by using the inequalities below. 


y-interceptB: b- E<B<b+E 


1 
where E = t,8¢ 7 + 5 and 


slopeM: m-E<M<m+E 
t.Se 
(2x)? 


enna 
n 


where E = 


The values of m and b are obtained from the sample data, and the critical value t, 
is found using Table 5 in Appendix B with n — 2 degrees of freedom. 


In Exercises 37 and 38, construct the indicated confidence intervals for B and M 
using the gross domestic products and carbon dioxide emissions data found in 
Example 2. 


37. 95% confidence interval 38. 99% confidence interval 


SECTION 9.4 Multiple Regression 531 


What You Should Learn 


» How to use technology to 
find and interpret a multiple 
regression equation, the 
standard error of estimate, and 
the coefficient of determination 


~ How to use a multiple 
regression equation to predict 
y-values 


Tech Tip 


Detailed instructions 

for using Minitab and 
Excel to find a multiple 
regression equation are 
shown in the technology 
manuals that accompany 
this text. 


Finding a Multiple Regression Equation m Predicting y-Values 


Finding a Multiple Regression Equation 


In many instances, a better prediction model can be found for a dependent 
(response) variable by using more than one independent (explanatory) variable. 
For instance, a more accurate prediction for the carbon dioxide emissions 
discussed in previous sections might be made by considering the number of 
cars as well as the gross domestic product. Models that contain more than one 
independent variable are multiple regression models. 


DEFINITION 


A multiple regression equation for independent variables x1, x2, x3, ..., Xx 
and a dependent variable y has the form 


P= D+ mx, + Myx. + 3x3 + °° + MEX, 


where # is the predicted y-value for given x; values and b is the y-intercept. 
The y-intercept b is the value of } when all x; are 0. Each coefficient m; is the 
amount of change in when the independent variable x; is changed by one 
unit and all other independent variables are held constant. 


Because the mathematics associated with multiple regression is complicated, 
this section focuses on how to use technology to find a multiple regression 
equation and how to interpret the results. 


Finding a Multiple Regression Equation 


A researcher wants to determine how employee salaries at a company are 
related to the length of employment, previous experience, and education. The 
researcher selects eight employees from the company and obtains the data 
shown in the table. 


Salary Employment Experience Education 
Employee (indollars), y (in years), x, | (in years),.x, (in years), x3 
A 57,310 10 2 16 
B 57,380 6 16 
Cc 54,135 il, 12 
D 56,985 5 14 
EB 58,715 8 16 
F 60,620 20 0 12 
G 59,200 8 4 18 
H 60,320 14 6 17 


Use Minitab to find a multiple regression equation that models the data. 


532 CHAPTER 9 Correlation and Regression 


SOLUTION 


Enter the y-values in Cl and the x -, x2-, and x3-values in C2, C3, and C4, 
respectively. Select “Regression> Regression» Fit Regression Model” from 
the Stat menu. Using the salaries as the response variable and the remaining 
data as the predictors, you should obtain results similar to the display shown. 


MINITAB 


Study Tip Regression Analysis: Salary, y versus x1, x2, x3 
In Example 1, It is 
important that you 
interpret the coefficients 
M,, Mp, and m3 correctly. 


For instance, if x. and x3 


Model Summary 


S R-sq R-sqladj) 
659.490 94.38% 90.17% 


» are held constant and x; Coefficients 
increases by 1, then y increases by 
$364. Similarly, if x, and x3 are held Term SE Coef T-Value P-Value 
constant and x, increases by 1, then Constant b 1981 25.12 0.000 
y increases by $228. If x, and x2 are x1 m 48.3 Toya 0.002 
held constant and x3 increases by 1, x2 mM) 124 ais74) 0.140 
then y increases by $267. x3 m3 147 le Sil 0.144 


Regression Equation 


Salary, y = 49764 + 364.4 x1 + 228 xe + 267 x3 


The regression equation is } = 49,764 + 364x, + 228x2 + 267x3. 
TRY IT YOURSELF 1 


A Statistics professor wants to determine how students’ final grades are related 
to the midterm exam grades and number of classes missed. The professor 
selects 10 students and obtains the data shown in the table. 


Student Final grade, y Midterm exam, x, | Classes missed, x2 


1 81 75 1 
2 90 80 0 
3 86 91 2 
4 76 80 3 
5 51 62 6 
6 75 90 4 
7 44 60 7 
8 81 82 > 
9 94 88 0 

10 93 96 1 


Use technology to find a multiple regression equation that models the data. 
Answer: Page A38 


Minitab displays much more than the regression equation and the coefficients 
of the independent variables. For instance, it also displays the standard error of 
estimate, denoted by S, and the coefficient of determination, denoted by R-Sq. In 
Example 1, § = 659.490 and R-Sq = 94.38%. So, the standard error of estimate 
is $659.49. The coefficient of determination tells you that 94.38% of the variation 
in y can be explained by the multiple regression model. The remaining 5.62% is 
unexplained and is due to other factors, such as sampling error, coincidence, or 
lurking variables. 


SR 


eee) Picturing 
the World 


In a lake in Finland, 159 fish 
of 7 species were caught and 


measured for weight G (in grams), 


length L (in centimeters), height 
H, and width W (H and W are 
percents of L). The regression 
equation for G and Lis 

G = —491 + 28.5L, 

r ~ 0.925, r? ~ 0.855. 
When all four variables are used, 
the regression equation is 

G = —712 + 28.3L 4 

1.46H + 13.3W, 
r ~ 0.930, r? ~ 0.865. 


(Source: Journal of Statistics Education) 


Predict the weight of a fish with 
the following measurements: 

L = 40, H = 17, and W = 11. 
How do your predictions vary 
when you use a single variable 
versus many variables? Which do 
you think is more accurate? 


SECTION 9.4 Multiple Regression 533 


Predicting y-Values 


After finding the equation of the multiple regression line, you can use the 
equation to predict y-values over the range of the data. To predict y-values, 
substitute the given value for each independent variable into the equation, then 
calculate f. 


Predicting y-Values Using Multiple Regression Equations 


Use the regression equation 
Y = 49,764 + 364x, + 228x, + 267x3 
found in Example 1 to predict an employee’s salary for each set of conditions. 


1. 12 years of current employment 
5 years of previous experience 


16 years of education 


2. 4 years of current employment 
2 years of previous experience 


12 years of education 


3. 8 years of current employment 
7 years of previous experience 


17 years of education 


SOLUTION 
To predict each employee’s salary, substitute the values for x;, x2, and x3 into 
the regression equation. Then calculate 9. 
1. § = 49,764 + 364x, + 228x, + 267x3 
= 49,764 + 364(12) + 228(5) + 267(16) 
= 59/544 
The employee’s predicted salary is $59,544. 
2. 9 = 49,764 + 364x, + 228x, + 267x3 
= 49,764 + 364(4) + 228(2) + 267(12) 
= 54,880 
The employee’s predicted salary is $54,880. 


3. § = 49,764 + 364x, + 228x, + 267x3 
= 49,764 + 364(8) + 228(7) + 267(17) 
= 58,811 
The employee’s predicted salary is $58,811. 


TRY IT YOURSELF 2 

Use the regression equation found in Try It Yourself 1 to predict a student’s 
final grade for each set of conditions. 

1. A student has a midterm exam score of 89 and misses 1 class. 

2. A student has a midterm exam score of 78 and misses 3 classes. 


3. A student has a midterm exam score of 83 and misses 2 classes. 
Answer: Page A38 


534 CHAPTER 9 Correlation and Regression 


9.4 EXERCISES 


For Extra Help: MyLab Statistics 


Building Basic Skills and Vocabulary 


Predicting y-Values Jn Exercises 1-4, use the multiple regression equation to 
predict the y-values for the values of the independent variables. 


1. Cauliflower Yield The equation used to predict the annual cauliflower yield 
(in pounds per acre) is 
§ = 24,791 + 4.508x, — 4.723x2 
where x, is the number of acres planted and x, is the number of acres 
harvested. (Adapted from United States Department of Agriculture) 
(a) x; = 36,500, x. = 36,100 
(b) x; = 38,100, x, = 37,800 
(c) x; = 39,000, x. = 38,800 
(d) x, = 42,200, x. = 42,100 
2. Sorghum Yield The equation used to predict the annual sorghum yield (in 
bushels per acre) is 


} = 80.1 — 20.2x, + 21.2x, 


where x, is the number of acres planted (in millions) and x, is the 
number of acres harvested (in millions). (Adapted from United States Department 
of Agriculture) 


(a) x, = 5.5, x, = 3.9 
(b) x, = 8.3, x2 = 7.3 
(c) x, = 6.5, x, = 5.7 
(d) x, = 9.4, x. = 7.8 
3. Black Cherry Tree Volume The volume (in cubic feet) of a black cherry tree 
can be modeled by the equation 


ay ==52.2 + 0.3x1 + 45x 


where x, is the tree’s height (in feet) and x, is the tree’s diameter (in 
inches). (Source: Journal of the Royal Statistical Society) 

(a) x, = 70, x2 = 8.6 

(b) x, = 65, x, = 11.0 

(c) x, = 83,x. = 17.6 

(d) x; = 87, x, = 19.6 


4. Elephant Weight The equation used to predict the weight of an elephant 
(in kilograms) is 
y = —4016 + 11.5x, + 7.55x, + 12.5x3 
where x, represents the girth of the elephant (in centimeters), x, represents the 
length of the elephant (in centimeters), and x; represents the circumference of 
a footpad (in centimeters). (Source: Field Trip Earth) 
(a) x, = 421, x) = 224, x; = 144 
(b) x, = 311, x. = 171, x3 = 102 
(c) x, = 376, x. = 226, x3; = 124 
(d) xy = 231, x2 135, x3 = 86 


SECTION 9.4 Multiple Regression 535 


Using and Interpreting Concepts 


Finding a Multiple Regression Equation In Exercises 5 and 6, use 
technology to find (a) the multiple regression equation for the data shown in the 
table, (b) the standard error of estimate, and (c) the coefficient of determination. 
Interpret the results. 


eB 5. Used Cars The table shows the prices (in dollars), age (in years), and 
mileage (in thousands of miles) of eight pre-owned Honda Civic Sedans. 


Price,y Age,x, Mileage, x, 


9454 6 91.2 
10,920 5 77.1 
13,929 3 45.1 
14,604 2 Sh.7 
11,500 4 52.1 
15,308 2 34.7 
14,500 3 35.6 
14,878 3 21.6 

8000 9 87.9 


6. Shareholder’s Equity The table shows the net sales (in billions of dollars), 
total assets (in billions of dollars), and shareholder’s equities (in billions of 
dollars) for Wal-Mart for six years. (Adapted from Wal-Mart Stores, Inc.) 


Shareholder’s equity, y | Net sales, x, Total assets, x, 


71.3 443.9 193.4 
76.3 465.6 202.9 
76.3 473.1 204.5 
81.4 482.2 203.5 
80.5 478.6 199.6 
778 481.3 198.8 


Extending Concepts 


Adjusted r?_ The calculation of the coefficient of determination r? depends on the 
number of data pairs and the number of independent variables. An adjusted value of 
r? based on the number of degrees of freedom is calculated using the formula 


(1 - r?)(n - 1) 
n-k-1 


Ze 
adj 1 


where n is the number of data pairs and k is the number of independent variables. 


In Exercises 7 and 8, calculate adj and determine the percentage of the variation 
in y that can be explained by the relationships between variables according to adj: 
Compare this result with the one obtained using r?. 


7. Calculate adj for the data in Exercise 5. 


8. Calculate adj for the data in Exercise 6. 


AND 


536 


+—> x 


Statistics in the Real World 


Uses 


Correlation and Regression Correlation and regression analysis can be used 
to determine whether there is a significant relationship between two variables. 
When there is, you can use one of the variables to predict the value of the other 
variable. For instance, educators have used correlation and regression analysis 
to determine that there is a significant correlation between a student’s SAT 
score and the grade point average from a student’s freshman year at college. 
Consequently, many colleges and universities use SAT scores of high school 
applicants as a predictor of the applicant’s initial success at college. 


Abuses 


Confusing Correlation and Causation The most common abuse of correlation 
in studies is to confuse the concepts of correlation with those of causation 
(see page 502). Good SAT scores do not cause good college grades. Rather, there 
are other variables, such as good study habits and motivation, that contribute to 
both. When a strong correlation is found between two variables, look for other 
variables that are correlated with both. 


Considering Only Linear Correlation The correlation studied in this chapter 
is linear correlation. When the correlation coefficient is close to 1 or close to —1, 
the data points can be modeled by a straight line. It is possible that a correlation 
coefficient is close to 0 but there is still a strong correlation of a different type. 
Consider the data listed in the table at the left. The value of the correlation 
coefficient is 0. However, the data are perfectly correlated with the equation 
x? + y* = 1, as shown in the figure at the left. 


Ethics 


When data are collected, all of the data should be used when calculating 
statistics. In this chapter, you learned that before finding the equation of a 
regression line, it is helpful to construct a scatter plot of the data to check for 
outliers, gaps, and clusters in the data. Researchers cannot use only those data 
points that fit their hypotheses or those that show a significant correlation. 
Although eliminating outliers may help a data set coincide with predicted 
patterns or fit a regression line, it is unethical to amend data in such a way. An 
outlier or any other point that influences a regression model can be removed 
only when it is properly justified. 

In most cases, the best and sometimes safest approach for presenting 
statistical measurements is with and without an outlier being included. By 
doing this, the decision as to whether or not to recognize the outlier is left to 
the reader. 


EXERCISES 


1. Confusing Correlation and Causation Find an example of an article 
that confuses correlation and causation. Discuss other variables that could 
contribute to the relationship between the variables. 


2. Considering Only Linear Correlation Find an example of two real-life 
variables that have a nonlinear correlation. 


CHAPTER 9 Correlation and Regression 


go Chapter Summary 


What Did You Learn? 


Section 9.1 
» How to construct a scatter plot and how to find a correlation coefficient 
a ndxy — (2x) (2y) 
Vn>x? — (3x)?Vndy? — (Sy)? 


» How to test a population correlation coefficient p using a table and how to 
perform a hypothesis test for a population correlation coefficient p 


Section 9.2 
» How to find the equation of a regression line 
y=mx-+ b 
_ maxy — (2x) (Ly) 
n>x? — (x)? 
Ly =x 
= m 


n n 


b=y- mx 
» How to predict y-values using a regression equation 


Section 9.3 
» How to find and interpret the coefficient of determination 


> _ Explained variation 
Total variation 


» How to find and interpret the standard error of estimate for a regression line 


_ {cs Yi)? re — b> y — mSxy 
* n=2 n-2 


» How to construct and interpret a prediction interval for y 


1, _n(X%>— x)? 


n- ndx2— (Sx)? 


GSES ySoe £, E= tesa 


Section 9.4 


» How to use technology to find and interpret a multiple regression equation, 
the standard error of estimate, and the coefficient of determination 


Y= D+ MX, + MoXy + M3X3 + °° + MEX, 


» How to use a multiple regression equation to predict y-values 


Chapter Summary 


Example(s) 


1-5 


6,7 


i a4 


Review 
Exercises 


1-4 


5-8 


9-12 


13-18 


17, 18 


19-24 


25, 26 


27, 28 


537 


538 


CHAPTER 9 Correlation and Regression 


go Review Exercises 


Section 9.1 


In Exercises 1-4, (a) display the data in a scatter plot, (b) calculate the sample 
correlation coefficient r, and (c) describe the type of correlation and interpret the 
correlation in the context of the data. 


1. The numbers of pass attempts and passing yards for seven professional 
quarterbacks for a recent regular season (Source: National Football League) 


Pass attempts,x 610 545 567 552 432 | 486 | 403 
Passing yards, y 4428 | 4240 | 4090 | 3877. 3554 | 3401 | 2710 


2. The numbers of wildland fires (in thousands) and wildland acres burned 
(in millions) in the United States for eight years (Source: National Interagency 
Coordinate Center) 


Fires,x 78.8 72.0 74.1) 67.8 | 47.6 | 63.3. 68.2 67.7 
Acres,y | 5.9 3.4 8.7 9.3 4.3 3.6 | 10.1 ) 5.5 


3. The intelligence quotient (IQ) scores and brain sizes, as measured by the 
total pixel count (in thousands) from an MRI scan, for nine female college 
students (Adapted from Intelligence) 


IQ score, x 138 | 140 96 83 | 101 | 135. 85 77 88 
Pixel count, y 991 | 856 879 | 865 808 791 799 | 794 894 


4. The annual per capita sugar consumptions (in kilograms) and the average 
numbers of cavities of 11- and 12-year-old children in seven countries 


Sugar consumption, x 2.1 5.0 | 6.3 6.5 7.7 | 87 | 11.6 
Cavities, y 0.59 | 1.51 | 1.55 | 1.70 | 2.18 | 2.10 | 2.73 


In Exercises 5—8, use Table 11 in Appendix B, or perform a hypothesis test using 
Table 5 in Appendix B to make a conclusion about the correlation coefficient. 


5. Refer to the data in Exercise 1. At a = 0.05, is there enough evidence to 
conclude that there is a significant linear correlation between the data? (Use 
the value of r found in Exercise 1.) 


6. Refer to the data in Exercise 2. At a = 0.05, is there enough evidence to 
conclude that there is a significant linear correlation between the data? (Use 
the value of r found in Exercise 2.) 


7. Refer to the data in Exercise 3. At a = 0.01, is there enough evidence to 
conclude that there is a significant linear correlation between the data? (Use 
the value of r found in Exercise 3.) 


8. Refer to the data in Exercise 4. At a = 0.01, is there enough evidence to 
conclude that there is a significant linear correlation between the data? (Use 
the value of r found in Exercise 4.) 


Review Exercises 539 


Section 9.2 


In Exercises 9-12, find the equation of the regression line for the data. Then 
construct a scatter plot of the data and draw the regression line. (Each pair of 
variables has a significant correlation.) Then use the regression equation to predict 
the value of y for each of the x-values, if meaningful. If the x-value is not meaningful 
to predict the value of y, explain why not. If convenient, use technology. 


9. The average number (in thousands) of milk cows and the amounts (in billions 


of pounds) of milk produced in the United States for eight years (Source: U.S. 
Department of Agriculture) 


Milk cows, x 9202 9123 | 9199 | 9237 
Milk produced, y 189.2 | 192.9 196.3 | 200.6 
Milk cows, x 9224 9257 | 9314 | 9328 


Milk produced, y 201.2 | 206.1 = 208.6 | 212.4 


(a) x = 9080 cows (b) x = 9230 cows 
(c) x = 9250 cows (d) x = 9300 cows 


7 10. The average times (in hours) per day spent watching television for men 


and women for 10 years (Source: U.S. Bureau of Labor Statistics) 
Men, x 2.80 2.88 | 3.01 3.10 2.94 
Women, y 2.36 | 2.38 | 2.55 | 2.56 | 2.53 
Men, x 2.99 3.07 | 2.98 | 3.05 | 3.02 


Women, y 2.53 | 2.61 | 2.57 | 2.61 | 2.56 


(a) x = 2.85 hours (b) x = 2.97 hours 
(c) x = 3.04 hours (d) x = 3.13 hours 


11. The ages (in years) and the numbers of hours of sleep in one night for 


12. 


seven adults 


Age, x 3520 59 42: 68 38 | 75 
Hours ofsleep,y 7 9 5 +> 6,5 )8 4 
(a) x = 16 years (b) x = 25 years 
(c) x = 85 years (d) x = 50 years 
The engine displacements (in cubic inches) and the fuel efficiencies (in miles 
per gallon) of seven automobiles 
Displacement, x 170 134. 220. 305.) 109 = 256 (322 
Fuel efficiency, y § 29.5 | 34.5 23.0 17.0 33.5 | 23.0 15.5 


(a) x = 86 cubic inches (b) x = 198 cubic inches 
(c) x = 289 cubic inches (d) x = 407 cubic inches 


540 CHAPTER 9 Correlation and Regression 


Section 9.3 


In Exercises 13-16, use the value of the correlation coefficient r to calculate 
the coefficient of determination r?. What does this tell you about the explained 
variation of the data about the regression line? about the unexplained variation? 


13. r = —0.450 
14. r = —0.937 
15. r = 0.642 
16. r = 0.795 


In Exercises 17 and 18, use the data to (a) find the coefficient of determination r? 
and interpret the result, and (b) find the standard error of estimate s, and interpret 
the result. 


17. The table shows the combined city and highway fuel efficiency (in miles 
per gallon gasoline equivalent) and top speeds (in miles per hour) for nine 
hybrid and electric cars. The regression equation is } = —0.465x + 139.433. 
(Source: Car and Driver) 


Fuel efficiency, x 114 95 120 105 | 107. 116 118 | 68 84 
Top speed, y 80 103 78 85 92 88 92 | 105 101 


7% 18. The table shows the cooking areas (in square inches) of 18 gas 
grills and their prices (in dollars). The regression equation is 
} = 2.335x — 853.278. (Source: Lowe’s) 


Area,x 650 669 | 529. 725 844 | 445 669 844 740 
Price,y 149 699 499 | 374 1599 187 1299 | 899 374 


Area,x 529 450 644 600 575 998 529 265 | 530 
Price, y 599 | 399 499 269 | 299 | 1999 519 99 | 109 


In Exercises 19-24, construct the indicated prediction interval and interpret 
the results. 


19. Construct a 90% prediction interval for the amount of milk produced in 
Exercise 9 when there are an average of 9275 milk cows. 


20. Construct a 90% prediction interval for the average time women spend per 
day watching television in Exercise 10 when the average time men spend per 
day watching television is 3.08 hours. 


21. Construct a 95% prediction interval for the number of hours of sleep for an 
adult in Exercise 11 who is 45 years old. 


22. Construct a 95% prediction interval for the fuel efficiency of an automobile 
in Exercise 12 that has an engine displacement of 265 cubic inches. 


23. Construct a 99% prediction interval for the top speed of a hybrid or electric 
car in Exercise 17 that has a combined city and highway fuel economy of 
90 miles per gallon equivalent. 


24. Construct a 99% prediction interval for the price of a gas grill in Exercise 18 
with a usable cooking area of 900 square inches. 


Review Exercises 541 


Section 9.4 


In Exercises 25 and 26, use technology to find (a) the multiple regression equation 
for the data shown in the table, (b) the standard error of estimate, and (c) the 
coefficient of determination. Interpret the result. 


ad} 25. The table shows the carbon monoxide, tar, and nicotine content, all 
in milligrams, of 14 brands of U.S. cigarettes. (Source: Federal Trade 
Commission) 


Carbon monoxide, y | Tar, x, Nicotine, x, 


15 16 1.1 
17 16 1.0 
11 10 0.8 
12 fi 0.9 
14 13 0.8 
16 14 0.8 
14 16 12 
16 16 1.2 
10 10 0.8 
18 19 1.4 
17 17 1.2 
11 12 1.0 
10 9 0.7 
14 15 1.2 


26. The table shows the numbers of acres planted, the numbers of acres 
harvested, and the annual yields (in pounds) of spinach for five years. (Source: 
United States Department of Agriculture) 


Yield, y Acres planted, x, | Acres harvested, x, 


15,200 36,400 35,000 
18,600 35,400 32,900 
17,900 34,400 32,300 
18,600 38,500 36,600 
16,000 36,400 35,680 


In Exercises 27 and 28, use the multiple regression equation to predict the y-values 
for the values of the independent variables. 


27. An equation that can be used to predict fuel economy (in miles per gallon) 
for automobiles is 
y = 41.3 — 0.004x, — 0.0049x, 


where x, is the engine displacement (in cubic inches) and x, is the vehicle 
weight (in pounds). 


(a) x, = 305, x. = 3750 (b) x, = 225, x. = 3100 

(c) x; = 105, x, = 2200 (d) x, = 185, x, = 3000 
28. Use the regression equation found in Exercise 25. 

(a) x, = 10, x2 = 0.7 (b) x, = 15, x. = 1.1 


(c) yA 13, x2 = 1.0 (d) Ay = 9, a 0.8 


542 CHAPTER 9 Correlation and Regression 


go Chapter Quiz 


Take this quiz as you would take a quiz in class. After you are done, check your 
work against the answers given in the back of the book. 


7 For Exercises 1-8, use the data in the table, which shows the average annual 


salaries (both in thousands of dollars) for secondary and elementary school 
teachers, excluding special and vocational education teachers, in the United 
States for 11 years. (Source: Bureau of Labor Statistics) 


Secondary school teachers, x | Elementary school teachers, y 


51.2 48.7 
52.5 50.0 
54.4 52.2 
55.2 53.2 
56.0 54.3 
56.8 55.3 
57.8 56.1 
58.3 56.3 
59.3 56.8 
60.4 57.7 
61.4 59.0 


Construct a scatter plot for the data. Do the data appear to have a positive 
linear correlation, a negative linear correlation, or no linear correlation? 
Explain. 


2. Calculate the correlation coefficient r and interpret the result. 


Test the significance of the correlation coefficient r that you found in 
Exercise 2. Use a = 0.05. 


Find the equation of the regression line for the data. Draw the regression line 
on the scatter plot that you constructed in Exercise 1. 


Use the regression equation that you found in Exercise 4 to predict the 
average annual salary of elementary school teachers when the average annual 
salary of secondary school teachers is $52,500. 


Find the coefficient of determination 7? and interpret the result. 


7. Find the standard error of estimate s, and interpret the result. 


Construct a 95% prediction interval for the average annual salary of 
elementary school teachers when the average annual salary of secondary 
school teachers is $52,500. Interpret the results. 


Stock Price The equation used to predict the stock price (in dollars) at the 
end of the year for a restaurant chain is 


9 = -86 + 7.46x, — 1.61x, 


where xj is the total revenue (in billions of dollars) and x, is the shareholders’ 
equity (in billions of dollars). Use the multiple regression equation to predict 
the y-values for the values of the independent variables. 


(a) x= 27.6, x2 = 15.3 (b) y= 24.1, x2 = 14.6 
(c) Ps 23.5, Xo 13.4 (d) xy 22.8, x2 15.3 


Chapter Test 543 


go Chapter Test 


Take this test as you would take a test in class. 
1. Net Sales The equation used to predict the net sales (in millions of dollars) 
for a fiscal year for a clothing retailer is 
JY = 23,769 + 9.18x, — 8.41x2 


where x, is the number of stores open at the end of the fiscal year and x, is 
the average square footage per store. Use the multiple regression equation to 
predict the y-values for the values of the independent variables. 


(a) x; = 1057, x. = 3698 (b) x, = 1012, x, = 3659 

(c) x; = 952, x. = 3601 (d) x, = 914, x. = 3594 
7 For Exercises 2-9, use the data in the table, which shows the average annual 
salaries (both in thousands of dollars) for librarians and postsecondary 


library science teachers in the United States for 12 years. (Source: Bureau of 
Labor Statistics) 


Librarians, x Library science teachers, y 


49.1 56.6 
50.9 57.6 
52.9 59.7 
54.7 61.6 
25.7 64.3 
56.4 67.0 
57.0 70.0 
57.2 70.8 
57.6 73.3 
58.1 72.4 
58.9 73.0 
59.9 72.3 


2. Construct a scatter plot for the data. Do the data appear to have a positive 
linear correlation, a negative linear correlation, or no linear correlation? 
Explain. 


3. Calculate the correlation coefficient r and interpret the result. 


4. Test the significance of the correlation coefficient r that you found in 
Exercise 3. Use a = 0.01. 


5. Find the equation of the regression line for the data. Draw the regression line 
on the scatter plot that you constructed in Exercise 2. 


6. Use the regression equation that you found in Exercise 5 to predict the 
average annual salary of postsecondary library science teachers when the 
average annual salary of librarians is $56,000. 


7. Find the coefficient of determination r? and interpret the result. 
8. Find the standard error of estimate s, and interpret the result. 


9. Construct a 99% prediction interval for the average annual salary of 
postsecondary library science teachers when the average annual salary of 
librarians is $56,000. Interpret the results. 


Putting it all together 


REAL DECISIONS 


Acid rain affects the environment by increasing the acidity of lakes 
and streams to dangerous levels, damaging trees and soil, accelerating 
the decay of building materials and paint, and destroying national 
monuments. The goal of the Environmental Protection Agency’s 
(EPA) Acid Rain Program is to achieve environmental health 
benefits by reducing the emissions of the primary causes of acid rain: 
sulfur dioxide and nitrogen oxides. 

You work for the EPA and you want to determine whether there 
is a significant correlation between the average concentrations of sulfur 
dioxide and nitrogen dioxide. 


EXERCISES 


1. Analyzing the Data 


(a) The data in the table show the annual averages of the daily 
maximum concentrations of sulfur dioxide (in parts per billion) 
and nitrogen dioxide (in parts per billion) for 12 years. Construct 
a scatter plot of the data and make a conclusion about the type 
of correlation between the average concentrations of sulfur 
dioxide and nitrogen dioxide. 


(b) Calculate the correlation coefficient r and verify your conclusion 
in part (a). 

(c) Test the significance of the correlation coefficient found in 
part (b). Use a = 0.05. 


(d) Find the equation of the regression line for the average 
concentrations of sulfur dioxide and nitrogen dioxide. Add 
the graph of the regression line to your scatter plot in part (a). 
Does the regression line appear to be a good fit? 


(e) Can you use the equation of the regression line to predict the 
average concentration of nitrogen dioxide given the average 
concentration of sulfur dioxide? Why or why not? 

(f) Find the coefficient of determination r? and the standard error 
of estimate s,. Interpret your results. 


2. Making Predictions 


Construct a 95% prediction interval for the average concentration of 
nitrogen dioxide when the average concentration of sulfur dioxide is 
28 parts per billion. Interpret the results. 


544 CHAPTER 9 Correlation and Regression 


Average 
sulfur dioxide 
concentration, x 


75.6 
74.9 
68.9 
64.7 
59.0 
50.8 
46.3 
37.9 
36.7 
30.5 
31.9 
29:3 


Average 
nitrogen dioxide 
concentration, y 


56.3 
55.9 
54.9 
33.2 
52.2 
48.0 
47.5 
47.9 
44.8 
45.9 
46.9 
44.6 


(Source: Environmental Protection Agency) 


TECHNOLOGY 


TI-84 PLUS 


Nutrients in Breakfast Cereals =< <2 2) 


U.S. Food and Drug 


FDA Administration 


The U.S. Food and Drug Administration (FDA) requires 
nutrition labeling for most foods. Under FDA regulations, 
manufacturers are required to list the amounts of certain 
nutrients in their foods, such as calories, sugar, fat, and 
carbohydrates. This nutritional information is displayed in 
the “Nutrition Facts” panel on the food’s package. 

The table shows the nutritional content below for one 
cup of each of 21 different breakfast cereals. 


C = calories 

S = sugar in grams 

F = fat in grams 

R = carbohydrates in grams 


1. Use technology to draw a scatter plot of the (x, y) 4. 
pairs in each data set. 


(a) (calories, sugar) 

(b) (calories, fat) 

(c) (calories, carbohydrates) 5. 
(d) (sugar, fat) 

(e) (sugar, carbohydrates) 

(f) (fat, carbohydrates) 


2. From the scatter plots in Exercise 1, which pairs of 
variables appear to have a strong linear correlation? 


3. Use technology to find the correlation coefficient for 
each pair of variables in Exercise 1. Which has the 
strongest linear correlation? 


Extended solutions are given in the technology manuals that accompany this text. 
Technical instruction is provided for Minitab, Excel, and the TI-84 Plus. 


120 3° 05 26 


120. 10 —)— OO 29 


Use technology to find an equation of a regression line 
for each pair of variables. 


(a) (calories, sugar) 
(b) (calories, carbohydrates) 
Use the results of Exercise 4 to predict each value. 


(a) The sugar content of one cup of cereal that has 
120 calories 


(b) The carbohydrate content of one cup of cereal that 
has 120 calories 


. Use technology to find the multiple regression 


equations of each form. 
(a) C=b+mS + mF + m3R 


. Use the equations from Exercise 6 to predict the 


calories in 1 cup of cereal that has 7 grams of sugar, 
0.5 gram of fat, and 31 grams of carbohydrates. 


Technology 545 


546 


10 


chi-Square lests and 
the F-Distripution 


SS it. 


Case Study 


10.3 


104 


Uses and Abuses 
Real Statistics—Real Decisions 
Technology 


Crash tests performed by the Insurance Institute for Highway Safety demonstrate how 

a vehicle will react when in a realistic collision. Tests are performed on the front, side, 
rear, and roof of the vehicles. Results of these tests are classified using the ratings good, 
acceptable, marginal, and poor. 


J Where You ve Been 


In Chapter 8, you learned how to test a hypothesis that 
compares two populations by basing your decisions on 
sample statistics and their distributions. For instance, the 
Insurance Institute for Highway Safety buys new vehicles 
each year and crashes them into a barrier at 40 miles per 
hour to compare how different vehicles protect drivers in 
a frontal offset crash. In this test, 40% of the total width of 
the vehicle strikes the barrier on the driver side. The forces 
and impacts that occur during a crash test are measured by 
equipping dummies with special instruments and placing 
them in the car. The crash test results include data on head, 
chest, and leg injuries. For a low crash test number, the 
injury potential is low. If the crash test number is high, 
then the injury potential is high. Using the techniques of 
Chapter 8, you can determine whether the mean chest 
injury potential is the same for midsize SUVs and large 


Ly, Where You re Going 


pickups. (Assume the populations are normally distributed 
and the population variances are equal.) The table shows 
the sample statistics. (Adapted from Insurance Institute for 
Highway Safety) 


Mean Standard 


Vehicle | Number chestinjury deviation 
| Large Pickups ny = 12 xX, = 23.0 | 5, = 2.09 
| Midsize SUVs ny, = 19 X.= 224 = ss» = 4.26 


For the means of chest injury, the P-value for the hypothesis 
that 1 = 2 is about 0.6655. At a = 0.01, you fail to reject 
the null hypothesis. So, you do not have enough evidence 
to conclude that there is a significant difference in the 
means of the chest injury potential in a frontal offset crash 
at 40 miles per hour for large pickups and midsize SUVs. 


In this chapter, you will learn how to test a hypothesis that 
compares three or more populations. 

For instance, in addition to the crash tests for large pickups 
and midsize SUVs, a third group of vehicles was also tested. 
The table shows the results for all three types of vehicles. 


Mean Standard 
Vehicle Number chest injury deviation | 
Large Pickups | ny, = 12 | Xx, = 23.0 | Ss; = 2.09 
Midsize SUVs | ny = 19 X, = 22.4 So = 4.26 
Large Cars n; = 10 X3 = 27.2 | 53 = 6.65 | 


From these three samples, is there evidence of a difference 
in chest injury potential among large pickups, midsize 
SUVs, and large cars in a frontal offset crash at 40 miles 
per hour? 


You can answer this question by testing the hypothesis that 
the three means are equal. For the means of chest injury, 
the P-value for the hypothesis that 4, = w2 = 3 is about 
0.0283. At a = 0.01, you fail to reject the null hypothesis. 
So, there is not enough evidence at the 1% level of 
significance to conclude that at least one of the means is 
different from the others. 


547 


548 CHAPTER 10. Chi-Square Tests and the F-Distribution 


What You Should Learn 


»~ How to use the chi-square 
distribution to test whether a 
frequency distribution fits an 
expected distribution 


| Study Tip 


The hypothesis tests 
described in Sections 10.1 
and 10.2 can be used for 
qualitative data. 


The Chi-Square Goodness-of-Fit Test 


The Chi-Square Goodness-of-Fit Test 


A tax preparation company wants to determine the proportions of people 
who used different methods to prepare their taxes. To determine these 
proportions, the company can perform a multinomial experiment. A multinomial 
experiment is a probability experiment consisting of a fixed number of 
independent trials in which there are more than two possible outcomes for each 
trial. The probability of each outcome is fixed, and each outcome is classified into 
categories. (Remember from Section 4.2 that a binomial experiment has only 
two possible outcomes.) 

The company wants to test a retail trade association’s claim concerning 
the expected distribution of proportions of people who used different methods 
to prepare their taxes. To do so, the company could compare the distribution 
of proportions obtained in the multinomial experiment with the association’s 
expected distribution. To compare the distributions, the company can perform a 
chi-square goodness-of-fit test. 


DEFINITION 


A chi-square goodness-of-fit test is used to test whether a frequency 


distribution fits an expected distribution. 


To begin a goodness-of-fit test, you must first state a null and an alternative 
hypothesis. Generally, the null hypothesis states that the frequency distribution 
fits an expected distribution and the alternative hypothesis states that the 
frequency distribution does not fit the expected distribution. 

For instance, the association claims that the expected distribution of people 
who used different methods to prepare their taxes is as shown below. 


Distribution of tax preparation methods 


Accountant 24% 
By hand 20% 
Computer software 35% 
Friend/family 6% 
Tax preparation service 15% 


To test the association’s claim, the company can perform a chi-square 
goodness-of-fit test using these null and alternative hypotheses. 


Hp: The expected distribution of tax preparation methods is 24% by 
accountant, 20% by hand, 35% by computer software, 6% by friend 
or family, and 15% by tax preparation service. (Claim) 


H,; The distribution of tax preparation methods differs from the expected 
distribution. 


SECTION 10.1 Goodness-of-Fit Test 549 


ON To calculate the test statistic for the chi-square goodness-of-fit test, you can 
“ay . ‘ 
f LEB use observed frequencies and expected frequencies. To calculate the expected 


eee) Picturing 
the World 


The pie chart shows the 
distribution of health care visits 
to doctor offices, emergency 
departments, and home visits in a 
recent year. (Source: National Center 
for Health Statistics) 


frequencies, you must assume the null hypothesis is true. 


DEFINITION 


The observed frequency O of a category is the frequency for the category 
observed in the sample data. 


The expected frequency E of a category is the calculated frequency for 
the category. Expected frequencies are found by using the expected (or 


hypothesized) distribution and the sample size. The expected frequency for 


10 or more visits the ith category is 


11.5% 


E; = np; 


where n is the number of trials (the sample size) and p; is the assumed 
probability of the ith category. 


1-3 visits 
50.4% 


Finding Observed Frequencies and Expected Frequencies 


A tax preparation company randomly 


A researcher randomly selects 
selects 300 adults and asks them how they 


300 people and asks them how 


Survey results (n = 300) 


many visits they make to the prepare their taxes. The results are shown Accountant 63 
doctor in a year: 1-3, 4-9, 10 at the right. Find the observed frequency By hand 40 
or more, or none. What is the and the expected frequency (using the Computer software 115 


expected frequency for each 
response? 


distribution on the preceding page) for 
each tax preparation method. (Adapied 
from National Retail Federation) 


Friend/family 29 


Tax preparation service 53 


SOLUTION 


The observed frequency for each tax preparation method is the number of 
adults in the survey naming a particular tax preparation method. The expected 
frequency for each tax preparation method is the product of the number of 
adults in the survey and the assumed probability that an adult will name a 
particular tax preparation method. The observed frequencies and expected 
frequencies are shown in the table below. 


Tax preparation % of | Observed Expected 

method people frequency frequency 
Accountant 24% 63 300(0.24) = 72 
By hand 20% 40 300(0.20) = 60 
Computer software 35% 115 300(0.35) = 105 
Friend/family 6% 29 300(0.06) = 18 
Tax preparation service 15% 53 300(0.15) = 45 


TRY IT YOURSELF 1 


The tax preparation company in Example 1 decides it wants a larger sample 
size, so it randomly selects 500 adults. Find the expected frequency for each tax 
preparation method for n = 500. 

Answer: Page A38 


The sum of the expected frequencies always equals the sum of the observed 
frequencies. For instance, in Example 1 the sum of the observed frequencies and 
the sum of the expected frequencies are both 300. 


550 CHAPTER 10 Chi-Square Tests and the F-Distribution 


Before performing a chi-square goodness-of-fit test, you must verify that 
(1) the observed frequencies were obtained from a random sample and (2) each 
expected frequency is at least 5. Note that when the expected frequency of a 
category is less than 5, it may be possible to combine the category with another 
one to meet the second requirement. 


The Chi-Square Goodness-of-Fit Test 


. Study Tip 
Remember that a 
chi-square distribution 

is positively skewed and 
its shape is determined 
by the degrees of 
freedom. Its graph is not 
symmetric, but it appears to become 
more symmetric as the degrees 

of freedom increase, as shown in 
Section 6.4. 


To perform a chi-square goodness-of-fit test, these conditions must be met. 
1. The observed frequencies must be obtained using a random sample. 
2. Each expected frequency must be greater than or equal to 5. 


If these conditions are met, then the sampling distribution for the test is 
approximated by a chi-square distribution with k — 1 degrees of freedom, 
where k is the number of categories. The test statistic is 
(O'= BE)’ 

E 


x => 


where O represents the observed frequency of each category and F represents 
the expected frequency of each category. 


When the observed frequencies closely match the expected frequencies, the 
differences between O and E will be small and the chi-square test statistic will 
be close to 0. As such, the null hypothesis is unlikely to be rejected. However, 
when there are large discrepancies between the observed frequencies and the 
expected frequencies, the differences between O and EF will be large, resulting 
in a large chi-square test statistic. A large chi-square test statistic is evidence for 
rejecting the null hypothesis. So, the chi-square goodness-of-fit test is always a 
right-tailed test. 


GUIDELINES 


Performing a Chi-Square Goodness-of-Fit Test 
In Words In Symbols 


. Verify that the observed frequencies E; = np; = 5 
were obtained from a random sample 
and each expected frequency is at 
least 5. 


. Identify the claim. State the null and State Hp and H,. 
alternative hypotheses. 


. Specify the level of significance. Identify a. 
. Identify the degrees of freedom. df.=k-—-1 
. Determine the critical value. Use Table 6 in Appendix B. 


. Determine the rejection region. 


(Oi #)F 
E 


. Find the test statistic and sketch the yy =>% 
sampling distribution. 
. Make a decision to reject or fail to If y? is in the rejection 
reject the null hypothesis. region, then reject Hp. 
Otherwise, fail to reject Hp. 


. Interpret the decision in the context 
of the original claim. 


Tax 
preparation 
method 
Accountant 

By hand 


Computer 
software 


Friend/ 
family 


Tax 
preparation 
service 


5 10 
UP =13.277 2? ~ 16.888 


Observed Expected 
frequency | frequency 


63 72 
40 60 
115 105 
29 18 
53 45 

Rejection 

; region 

| 

| 

| a@=0.01 

| 

| 


15 20 25 


SECTION 10.1 Goodness-of-Fit Test 551 


Performing a Chi-Square Goodness-of-Fit Test 


A retail trade association claims that the tax preparation methods of adults 
are distributed as shown in the table at the left below. A tax preparation 
company randomly selects 300 adults and asks them how they prepare their 
taxes. The results are shown in the table at the right below. At a = 0.01, test 
the association’s claim. (Adapted from National Retail Federation) 


Distribution of tax preparation methods Survey results (n = 300) 

Accountant 24% Accountant 63 

By hand 20% By hand 40 

Computer software 35% Computer software 115 

Friend/family 6% Friend/family 29 

Tax preparation service 15% Tax preparation service 53 
SOLUTION 


The observed and expected frequencies are shown in the table at the left. The 
expected frequencies were calculated in Example 1. Because the observed 
frequencies were obtained using a random sample and each expected 
frequency is at least 5, you can use the chi-square goodness-of-fit test to test the 
proposed distribution. Here are the null and alternative hypotheses. 


Ho: The expected distribution of tax preparation methods is 24% by 
accountant, 20% by hand, 35% by computer software, 6% by friend 
or family, and 15% by tax preparation service. (Claim) 


H,: The distribution of tax preparation methods differs from the expected 
distribution. 


Because there are 5 categories, the chi-square distribution has 
df.=k-1=5-1=4 


degrees of freedom. Withd.f. = 4anda = 0.01, the critical value is Xo = 13.277. 
The rejection region is 


y™ > 13.277. Rejection region 


With the observed and expected frequencies, the chi-square test statistic is 


(C2): 
a a ae 
. (872)? Oe)? (i= 105)" 
7 iP . 60 7 105 
: (29 — 18)? : (53 — 45)? 
18 45 
= 16.888. 


The figure at the left shows the location of the rejection region and the 
chi-square test statistic. Because y7 is in the rejection region, you reject the 
null hypothesis. 

Interpretation There is enough evidence at the 1% level of significance 
to reject the claim that the distribution of tax preparation methods and the 
association’s expected distribution are the same. 


552 


Ages 
0-9 
10-19 
20-29 
30-39 
40-49 
50-59 
60-69 
70+ 


CHAPTER 10° Chi-Square Tests and the F-Distribution 


Previous age 
distribution 


16% 
20% 

8% 
14% 
15% 
12% 
10% 

5% 


Survey 
results 


76 
84 
30 
60 
54 
40 
42 
14 


TRY IT YOURSELF 2 


A sociologist claims that the age distribution for the residents of a city is 
different from the distribution 10 years ago. The distribution of ages 10 years 
ago is shown in the table at the left. You randomly select 400 residents and 
record the age of each. The survey results are shown in the table. At a = 0.05, 
perform a chi-square goodness-of-fit test to test whether the distribution has 
changed. 

Answer: Page A38 


The chi-square goodness-of-fit test is often used to determine whether a 
distribution is uniform. For such tests, the expected frequencies of the categories 
are equal. When testing a uniform distribution, you can find the expected 
frequency of each category by dividing the sample size by the number of 
categories. For instance, suppose a company believes that the number of sales 
made by its sales force is uniform throughout a five-day workweek. If the sample 
consists of 1000 sales, then the expected value of the sales for each day will be 
1000/5 = 200. 


Performing a Chi-Square Goodness-of-Fit Test 

A researcher claims that the number of different-colored candies in bags 
of dark chocolate M&M’s® is uniformly distributed. To test this claim, 
you randomly select a bag that contains 500 dark chocolate M&M’s®. The 


results are shown in the table below. At a = 0.10, test the researcher’s claim. 
(Adapted from Mars, Incorporated) 


Color Frequency, f 


Brown 80 
Yellow 95 
Red 88 
Blue 83 
Orange 76 
Green 78 


SOLUTION 


The claim is that the distribution is uniform, so the expected frequencies of the 
colors are equal. To find each expected frequency, divide the sample size by 
the number of colors. So, for each color, E = 500/6 ~ 83.333. Because each 
expected frequency is at least 5 and the M&M’s® were randomly selected, you 
can use the chi-square goodness-of-fit test to test the expected distribution. 
Here are the null and alternative hypotheses. 


Ho: The expected distribution of the different-colored candies in bags of 
dark chocolate M&M’s® is uniform. (Claim) 


H,; The distribution of the different-colored candies in bags of dark 
chocolate M&M’s® is not uniform. 


Because there are 6 categories, the chi-square distribution has 


df.=k-1=6-1=5 


degrees of freedom. Using df. =5 and a = 0.10, the critical value is 
Xo = 9,236. The rejection region is y” > 9.236. To find the chi-square test 
statistic using a table, use the observed and expected frequencies, as shown on 
the next page. 


SECTION 10.1 Goodness-of-Fit Test 553 


o| £ |o-8#| (o-£F ORs 
E 
80 | 83.333 3.333 11.108889 0.133307201 
95 83.333 11.667 136.118889 1.633433202 
88 83.333 4.667 —-21.780889 0.261371713 
83 | 83.333 -0.333 0.110889 0.001330673 
76 | 83.333 —7.333 | 53.772889 0.645277249 
78 83.333 5.333 28.440889 0.341292033 
2 
= yf Fy ~ 3.016 


The figure shows the location of the rejection region and the chi-square 
test statistic. Because y7 is not in the rejection region, you fail to reject the 
null hypothesis. 


Rejection 


| 
| 
' é 
| region 
| 


a = 0.10 


5) 5 20 25 
17 =3.016 x5 = 9.236 


Interpretation There is not enough evidence at the 10% level of significance 
to reject the claim that the distribution of the different-colored candies in bags 
of dark chocolate M&M’s® is uniform. 


TRY IT YOURSELF 3 


A researcher claims that the number of different-colored candies in bags of 
peanut M&M’s® is uniformly distributed. To test this claim, you randomly 
select a bag that contains 180 peanut M&M’s®. The results are shown in the 
table below. Using a = 0.05, test the researcher’s claim. (Adapted from Mars, 


Incorporated) 
Color Frequency, f 
Brown 22 
Yellow 27 
Red 22 
Blue 41 
TI-84 PLUS Orange 41 
Green 27 


S2G0F-Test 


X6=3, 816812872 Answer: Page A38 
F=,.697o1r10r1 
eee £. LEZZHF... You can use technology and a P-value to perform a chi-square goodness-of-fit 


test. For instance, using a TI-84 Plus and the data in Example 3, you obtain 
P = 0.6975171071, as shown at the left. Because P > a, you fail to reject the 
null hypothesis. 


554 CHAPTER 10. Chi-Square Tests and the F-Distribution 


10.1 EXERCISES 


. cam h } 
For Extra Help: MyLab Statistics 7 


Building Basic Skills and Vocabulary 


1. What is a multinomial experiment? 


2. What conditions are necessary to use the chi-square goodness-of-fit test? 


Finding Expected Frequencies Jn Exercises 3-6, find the expected 
frequency for the values of n and p;. 


3. n = 125, p; = 0.4 4. n = 800, p; = 0.7 
5. n = 350, pj = 0.35 6. n = 610, p; = 0.89 


Using and Interpreting Concepts 


Performing a Chi-Square Goodness-of-Fit Test Jn Exercises 7-16, 
(a) identify the claim and state Hy and H,, (b) find the critical value and identify 
the rejection region, (c) find the chi-square test statistic, (d) decide whether to reject 
or fail to reject the null hypothesis, and (e) interpret the decision in the context of 
the original claim. 


7. Ages of Moviegoers A researcher claims that the ages of people who go 
to movies at least once a month are distributed as shown in the figure. You 
randomly select 1000 people who go to movies at least once a month and 
record the age of each. The table shows the results. At a = 0.10, test the 
researcher’s claim. (Source: Motion Picture Association of America) 


Survey results 


Age Frequency, f 


23% | 20% 2-17 240 

re eae 18-24 209 

bo 25-39 203 

| dd yl 40-49 106 

| br 50+ 242 
siill 


8. Coffee A researcher claims that the numbers of cups of coffee U.S. adults 
drink per day are distributed as shown in the figure. You randomly select 
1600 U.S. adults and ask them how many cups of coffee they drink per 
day. The table shows the results. At a = 0.05, test the researcher’s claim. 
(Source: Gallup) 


How many cups-o-joe do you Survey results 
drink per day? Z 
About 64% of Americans 
drink at least 1 cup per day. Response Brequenty, I 
How much they drink: 0 cups 570 
1 cup 

1 cup 432 

2 cups 282 

3 cups 152 


4 or more cups 164 


SECTION 10.1  Goodness-of-Fit Test 555 


9. Ordering Delivery A research firm claims that the distribution of the days 
of the week that people are most likely to order food for delivery is different 
from the distribution shown in the figure. You randomly select 500 people 
and record which day of the week each is most likely to order food for 
delivery. The table shows the results. At a = 0.01, test the research firm’s 
claim. (Source: Technomic, Inc.) 


Food at your door Survey results 
Day of the week Americans 

food for delivery: Day Frequency, f 

Sunday 43 

esday ey Monday 16 

<7 Tuesday 25 

i Re Wednesday 49 

ass Thursday 46 

Friday 168 

Saturday 153 


10. Going Cashless_ A financial analyst claims that the distribution of people 
who use cash to make their purchases is different from the distribution 
shown in the figure. You randomly select 600 people and record the way they 
make purchases. The table shows the results. At a = 0.01, test the financial 
analyst’s claim. (Adapted from Gallup) 


Making purchases Survey results 
All purchases with cash 
eee TEAT | 19% Response Frequency, f 
Most purchases with cash - 
17% All purchases with cash 60 
Half of h ith cash F 
: eee ae 20% Most purchases with cash 84 
ae 33% Half of purchases with cash 132 
No Pua with cash 11% Some purchases with cash 252 
No purchases with cash Tes 


11. Homicides by County A researcher claims that the number of homicide 
crimes in California by county is uniformly distributed. To test this claim, 
you randomly select 1000 homicides from a recent year and record the 
county in which each happened. The table shows the results. At a = 0.01, 
test the researcher’s claim. (Adapted from California Department of Justice) 


County Frequency, f County Frequency, f 
Alameda 116 Sacramento 90 
Contra Costa 35 San Bernardino 89 
Fresno a7 San Diego 45 
Kern 62 San Francisco 51 
Los Angeles 101 San Joaquin 62 
Monterey 58 Santa Clara 39 
Orange 30 Stanislaus 37 
Riverside 65 Tulare 43 


556 CHAPTER 10 Chi-Square Tests and the F-Distribution 


12. Violent Crimes by Year A researcher claims that the number of violent 
crimes in England and Wales by year is uniformly distributed. To test this 
claim, you randomly select 22,526 crimes from recent years and record the 
year when each happened. The table shows the results. At a = 0.05, test the 
researcher’s claim. (Adapted from Office for National Statistics, UK) 


Year Frequency, f Year Frequency, f 


2003 2307 2009 1774 
2004 2213 2010 1687 
2005 2010 2011 1896 
2006 1984 2012 1744 
2007 2103 2013 1666 
2008 1815 2014 1327 


13. College Education The pie chart shows the results of a survey in which 
U.S. parents were asked their opinions on whether a college education 
is worth the expense. An economist claims that the distribution of the 
opinions of U.S. teenagers is different from the distribution given for U.S. 
parents. To test this claim, you randomly select 200 U.S. teenagers and ask 
each whether a college education is worth the expense. The table shows the 
results. At a = 0.05, test the economist’s claim. (Adapted from Upromise, Inc.) 


Somewhat disagree 
9 Survey results 
6% 


Strongly 
disagree Response Frequency, f 
4% 


Neither agree 
nor disagree 


Strongly agree 86 
Somewhat agree 62 
Neither agree nor disagree 34 
Somewhat disagree 14 
Strongly Strongly disagree 4 


agree 
55% 


14. Money Management The pie chart shows the results of a survey in which 
married U.S. male adults were asked how much they trust their spouses 
to manage their finances. A financial services company claims that the 
distribution of how much married U.S. female adults trust their spouses to 
manage their finances is the same as the distribution given for married U.S. 
male adults. To test this claim, you randomly select 400 married U.S. female 
adults and ask each how much she trusts her spouse to manage their finances. 
The table shows the results. At a = 0.10, test the company’s claim. (Adapted 
from Country Financial) 


Do not trust Survey results 


5.7% 
° Response Frequency, f 
Trust with 
certain aspects 
278% 


Completely trust 243 


Trust with certain 


108 
aspects 


Do not trust 36 
Not sure 13 


Type of Visitors Frequency, f 


Students 518 
Patients 497 
Tourists 985 


TABLE FOR EXERCISE 15 


SECTION 10.1 Goodness-of-Fit Test 557 


15. Tourism An organization claims that the number of people visiting Egypt 
each year, which includes students, patients, and tourists, is not uniformly 
distributed. To test this claim, you randomly select 2000 prospective visitors 
and ask them the purpose of their visit. The table at the left shows the results. 
At a = 0.01, test the organization’s claim. (Adapted from Knoema) 


16. Births by Day of the Week A doctor claims that the number of births by 
day of the week is uniformly distributed. To test this claim, you randomly 
select 700 births from a recent year and record the day of the week on which 
each takes place. The table shows the results. At a = 0.10, test the doctor’s 
claim. (Adapted from National Center for Health Statistics) 


Day Frequency, f 
Sunday 68 
Monday 108 
Tuesday 115 
Wednesday 113 
Thursday 111 
Friday 108 
Saturday 77 


Extending Concepts 


Testing for Normality Using a chi-square goodness-of-fit test, you can decide, 
with some degree of certainty, whether a variable is normally distributed. In all 
chi-square tests for normality, the null and alternative hypotheses are as listed below. 


Hy: The variable has a normal distribution. 
H,: The variable does not have a normal distribution. 


To determine the expected frequencies when performing a chi-square test for 
normality, first find the mean and standard deviation of the frequency distribution. 
Then, use the mean and standard deviation to compute the z-score for each class 
boundary. Then, use the z-scores to calculate the area under the standard normal 
curve for each class. Multiplying the resulting class areas by the sample size yields 
the expected frequency for each class. 


In Exercises 17 and 18, (a) find the expected frequencies, (b) find the critical value 
and identify the rejection region, (c) find the chi-square test statistic, (d) decide 
whether to reject or fail to reject the null hypothesis, and (e) interpret the decision 
in the context of the original claim. 


17. Test Scores Ata = 0.01, test the claim that the 200 test scores shown in the 
frequency distribution are normally distributed. 
Class boundaries = 49.5-58.5 | 58.5-67.5 | 67.5-76.5 | 76.5-85.5 | 85.5—94.5 
Frequency, f 19 61 82 34 4 


18. Test Scores Ata = 0.05, test the claim that the 400 test scores shown in the 
frequency distribution are normally distributed. 


Class boundaries = 50.5-60.5 | 60.5—70.5 | 70.5-80.5 | 80.5-90.5 | 90.5-100.5 
Frequency, f 28 106 151 97 18 


558 


What You Should Learn 


» How to use a contingency table 
to find expected frequencies 


~ How to use a chi-square 
distribution to test whether two 
variables are independent 


Study Tip 


Note that “2 xX 5” is 
read as “two-by-five.” 


Study Tip 


In a contingency table, the 
notation E, , represents 
the expected frequency 
for the cell in row r, 
column c. For instance, in 
the table above, FE; 4 
represents the expected frequency 
for the cell in row 1, column 4. 


ila |ndependence 


CHAPTER 10 Chi-Square Tests and the F-Distribution 


Contingency Tables m= The Chi-Square Independence Test 


Contingency Tables 


In Section 3.2, you learned that two events are independent when the occurrence 
of one event does not affect the probability of the occurrence of the other 
event. For instance, the outcomes of a roll of a die and a toss of a coin are 
independent. But, suppose a medical researcher wants to determine whether 
there is a relationship between caffeine consumption and heart attack risk. Are 
these variables independent or are they dependent? In this section, you will learn 
how to use the chi-square test for independence to answer such a question. To 
perform a chi-square test for independence, you will use sample data that are 
organized in a contingency table. 


DEFINITION 


Anr X c contingency table shows the observed frequencies for two variables. 


The observed frequencies are arranged in r rows and c columns. The 
intersection of a row and a column is called a cell. 


A2 X 5 contingency table is shown below. It has two rows and five columns 
and shows the results of a random sample of 2197 adults classified by two 
variables, favorite way to eat ice cream and gender. From the table, you can see 
that 182 of the adults who prefer ice cream in a sundae are males, and 158 of the 
adults who prefer ice cream in a sundae are females. 


Favorite way to eat ice cream 


Gender Cup Cone | Sundae’ Sandwich Other 
Male 504 287 182 43 53 
Female | 474 401 158 45 50 


(Adapted from The Harris Poll) 


Assuming two variables are independent, you can use a contingency table to 
find the expected frequency for each cell, as shown in the next definition. 


Finding the Expected Frequency for Contingency Table Cells 


The expected frequency for a cell F, . in a contingency table is 


(Sum of row r) « (Sum of column c) 


Expected frequency E, . = Sample size 


When you find the sum of each row and column in a contingency table, you 
are calculating the marginal frequencies. A marginal frequency is the frequency 
that an entire category of one of the variables occurs. For instance, in the table 
above, the marginal frequency for adults who prefer ice cream in a cone is 
287 + 401 = 688. The observed frequencies in the interior of a contingency 
table are called joint frequencies. 


Study Tip 

In Example 1, after 

finding E,,, ~ 475.868, 

you can find E, ; by 

subtracting 475.868 

from the first column's 

total, 978. So, 
E,, ~ 978 — 475.868 = 502.132. 
In general, you can find the expected 
value for the last cell in a column by 
subtracting the expected values for 
the other cells in that column from 
the column's total. Similarly, you can 
do this for the last cell in a row using 
the row's total. 


SECTION 10.2 Independence 559 


In Example 1, notice that the marginal frequencies for the contingency table 
have already been calculated. 


Finding Expected Frequencies 


Find the expected frequency for each cell in the contingency table. Assume 
that the variables favorite way to eat ice cream and gender are independent. 


Favorite way to eat ice cream 


Gender Cup Cone Sundae’ Sandwich Other Total 
Male 504 287 182 43 53 1069 
Female 474 401 158 45 50 1128 
Total 978 688 340 88 103 2197 


SOLUTION 
After calculating the marginal frequencies, you can use the formula 


(Sum of row r) - (Sum of column c) 


Expected frequency E, . = Sample size 


to find each expected frequency. The expected frequencies for the first row are 


Ey, = “SS = 475.868 Ey .= —_ ~= 334.762 
£,3= carton = 165.435 E,,= —S = 42.818 
Eis = a =~ 50.117 

and the expected frequencies for the second row are 
Ey, = aed =~ 502.132 Ey. = a = 353.238 
Eo3= areas = 174.565 E,4= ae ~ 45.182 
Ey 5 = are = 52.883. 


TRY IT YOURSELF 1 


The marketing consultant for a travel agency wants to determine whether 
certain travel concerns are related to travel purpose. The contingency table 
shows the results of a random sample of 300 travelers classified by their 
primary travel concern and travel purpose. Assume that the variables travel 
concern and travel purpose are independent. Find the expected frequency for 
each cell. (Adapted from NPD Group for Embassy Suites) 


Travel concern 


Travel Hotel Legroom’ Rental 

purpose room on plane’ carsize Other 
Business 36 108 14 22 
Leisure 38 54 14 14 


Answer: Page A38 


560 


4 BN 
=") Picturing 
the World 


A researcher wants to determine 
whether a relationship exists 
between where people work 
(workplace or home) and their 
educational attainment. The 
results of a random sample of 925 
employed persons are shown in 
the contingency table. (Source: U.S. 
Bureau of Labor Statistics) 


Where they work 


Educational 

attainment | Workplace Home 
Less than 35 2 
high school 

High school 250 21 
diploma 

Some 226 30 
college 

BA degree 293 68 
or higher 


Can the researcher use this sample 
to test for independence using a 

chi-square independence test? Why 
or why not? 


CHAPTER 10 Chi-Square Tests and the F-Distribution 


The Chi-Square Independence Test 


After finding the expected frequencies, you can test whether the variables are 
independent using a chi-square independence test. 


DEFINITION 


A chi-square independence test is used to test the independence of two 


variables. Using this test, you can determine whether the occurrence of one 
variable affects the probability of the occurrence of the other variable. 


Before performing a chi-square independence test, you must verify that 
(1) the observed frequencies were obtained from a random sample and (2) each 
expected frequency is at least 5. 


The Chi-Square Independence Test 


To perform a chi-square independence test, these conditions must be met. 
1. The observed frequencies must be obtained using a random sample. 
2. Each expected frequency must be greater than or equal to 5. 


If these conditions are met, then the sampling distribution for the test is 
approximated by a chi-square distribution with 


df. = (r—1)(c-1) 


degrees of freedom, where r and c are the number of rows and columns, 
respectively, of a contingency table. The test statistic is 


oat 2) 
E 


xv =% 


where O represents the observed frequencies and E represents the expected 
frequencies. 


To begin the independence test, you must first state a null hypothesis and 
an alternative hypothesis. For a chi-square independence test, the null and 
alternative hypotheses are always some variation of these statements. 


Ho: The variables are independent. 
H,: The variables are dependent. 


The expected frequencies are calculated on the assumption that the two 
variables are independent. If the variables are independent, then you can expect 
little difference between the observed frequencies and the expected frequencies. 
When the observed frequencies closely match the expected frequencies, the 
differences between O and E will be small and the chi-square test statistic will be 
close to 0. As such, the null hypothesis is unlikely to be rejected. 

For dependent variables, however, there will be large discrepancies between 
the observed frequencies and the expected frequencies. When the differences 
between O and E are large, the chi-square test statistic is also large. A large 
chi-square test statistic is evidence for rejecting the null hypothesis. So, the 
chi-square independence test is always a right-tailed test. 


SECTION 10.2 Independence 561 


GUIDELINES 


Performing a Chi-Square Independence Test 
In Words In Symbols 


. Verify that the observed frequencies 
were obtained from a random sample 
and each expected frequency is at least 5. 


. Identify the claim. State the null and State Hp and H,. 
alternative hypotheses. 


Study Tip 


A contingency table 
with three rows and 
four columns will have 


(3—1)(4— 1) = (2)(3) 


. Specify the level of significance. Identify a. 


. Determine the degrees of freedom. df. = (r-1)(e-1) 


. Determine the critical value. Use Table 6 in Appendix B. 


. Determine the rejection region. 
(O=E) 


. Find the test statistic and sketch the oe E 
sampling distribution. 
. Make a decision to reject or fail to If x’ is in the rejection 
reject the null hypothesis. region, then reject Hp. 


Otherwise, fail to reject Hp. 


. Interpret the decision in the context 
of the original claim. 


Performing a Chi-Square Independence Test 


The contingency table shows the results of a random sample of 2197 adults 
classified by their favorite way to eat ice cream and gender. The expected 
frequencies are displayed in parentheses. At a = 0.01, can you conclude that 
the variables favorite way to eat ice cream and gender are related? 


Favorite way to eat ice cream 


Gender Cup Cone Sundae Sandwich Other = Total 
Male 504 287 182 43 53 1069 
(475.868) (334.762) | (165.435) (42.818) | (50.117) 
Female 474 401 158 45 50 1128 
(502.132) | (353.238) | (174.565) (45.182) | (52.883) 
Total 978 688 340 88 103 2197 
SOLUTION 


The expected frequencies were calculated in Example 1. Because each 
expected frequency is at least 5 and the adults were randomly selected, you 
can use the chi-square independence test to test whether the variables are 
independent. Here are the null and alternative hypotheses. 


Hy: The variables favorite way to eat ice cream and gender are independent. 


H,: The variables favorite way to eat ice cream and gender are dependent. 
(Claim) 


562 


CHAPTER 10 Chi-Square Tests and the F-Distribution 


The contingency table has two rows and five columns, so the chi-square 
distribution has 


df.=(r-—1)(c-1) =(2-1)(5-1) =4 


degrees of freedom. Because d.f. = 4 and a = 0.01, the critical value is 
Xo = 13.277. The rejection region is y” > 13.277. You can use a table to find 
the chi-square test statistic, as shown below. 


Interpretation There is enough evidence 
at the 1% level of significance to conclude 
that the variables favorite way to eat ice 
cream and gender are dependent. 


a =0.01 


O E O-E (O- EY (eas 
E 

504 | 475.868 28.132 791.409424 1.663086032 

287 | 334.762 | —47.762 | 2281.208644 6.814419331 

182 | 165.435 16.565 274.399225 1.658652794 

43 42.818 0.182 0.033124 0.0007736 

53 50.117 2.883 8.311689 0.165845701 

474 | 502.132 | —28.132 791.409424 1.576098365 

401 | 353.238 47.762 | 2281.208644 6.457993319 

158 | 174.565 | —16.565 274.399225 1.571902873 

45 45.182 —0.182 0.033124 0.000733124 

50 52.883 —2.883 8.311689 0.157171284 

(O-E)? 
= [= ~ 20.067 
The figure at the right shows the location + 
of the rejection region and the chi-square 
test statistic. Because 
x” ~ 20.067 

is in the rejection region, you reject the ! Bgjeonon 
null hypothesis. = 


5 10 15 20 25 
Xo =13277 ga 20067 


TRY IT YOURSELF 2 


The marketing consultant for a travel agency wants to determine whether travel 
concerns are related to travel purpose. The contingency table shows the results 
of a random sample of 300 travelers classified by their primary travel concern 
and travel purpose. At a = 0.01, can the consultant conclude that the variables 
travel concern and travel purpose are related? (The expected frequencies are 
displayed in parentheses.) (Adapted from NPD Group for Embassy Suites) 


Travel concern 
Travel Hotel Legroom Rental 
purpose room on plane car size Other Total 
Business | 36 (44.4) | 108 (97.2) | 1416.8) | 22 (21.6) 180 
Leisure 38 (29.6) 54 (64.8) | 14(11.2)  14(14.4) 120 
Total 74 162 28 36 300 


Answer: Page A38 


* Study Tip 
You can also use a P-value 
to perform a chi-square 
independence test. For 
instance, in Example 3, 
note that Minitab displays 


P= 0.322. Because P> a, 


you fail to reject the null hypothesis. 


SECTION 10.2 Independence 563 


Using Technology for a Chi-Square Independence Test 


A health club manager wants to determine whether the number of days per 
week that college students exercise is related to gender. A random sample of 
275 college students is selected and the results are classified as shown in the 
table. At a = 0.05, is there enough evidence to conclude that the number of 
days a student exercises per week is related to gender? 


Number of days of exercise per week 


Gender 0-1 2-3 4-5 6-7 Total 


Male 40 53 26 6 125 
Female 34 68 37 11 150 
Total 74 121 63 If 2S) 


SOLUTION Here are the null and alternative hypotheses. 
A: The number of days of exercise per week is independent of gender. 
H,: The number of days of exercise per week depends on gender. (Claim) 


Because d.f. = 3 and a = 0.05, the critical value is Xo = 7.815. So, the 
rejection region is y? > 7.815. Using Minitab (see below), the test statistic 
is vy? ~ 3.493. Because y* ~ 3.493 is not in the rejection region, you fail to 
reject the null hypothesis. 


MINITAB 


Tabulated Statistics: Gender, Number of days of exercise 


Rows: Gender Columns: Number of days of exercise 


O to 1 2 tio) 4to5 6 to 7 All 
Male 40 53 26 6 125 
Female 34 68 SM 11 150 
All 74 121 63 7 275 
Cell Contents: Count 


Pearson (Chi-Square = 3.493), DIF = si, P-Value = 0.322 


Interpretation There is not enough evidence to conclude that the number of 
days a student exercises per week is related to gender. 


TRY IT YOURSELF 3 


A researcher wants to determine whether age is related to whether or not a 
tax credit would influence an adult to purchase a hybrid vehicle. A random 
sample of 1250 adults is selected and the results are classified as shown in the 
table. At a = 0.01, is there enough evidence to conclude that age is related to 
the response? (Adapted from HNTB) 


Age 
Response 18-34 35-54 55andolder Total 
Yes 257 189 143 589 
No 218 261 182 661 
Total 475 450 825 1250 


Answer: Page A38 


564 CHAPTER 10. Chi-Square Tests and the F-Distribution 


10.2 EXERCISES 


For Extra Help: MyLab Statistics 


Building Basic Skills and Vocabulary 


1. 


4. 


Explain how to find the expected frequency for a cell in a contingency 
table. 


Explain the difference between marginal frequencies and joint frequencies 
in a contingency table. 


Explain how the chi-square independence test and the chi-square 
goodness-of-fit test are similar. How are they different? 


Explain why the chi-square independence test is always a right-tailed test. 


True or False? = Jn Exercises 5 and 6, determine whether the statement is true or 
false. If it is false, rewrite it as a true statement. 


5. 


6. 


If the two variables in a chi-square independence test are dependent, then 
you can expect little difference between the observed frequencies and the 
expected frequencies. 


When the test statistic for the chi-square independence test is large, you will, 
in most cases, reject the null hypothesis. 


Finding Expected Frequencies Jn Exercises 7-12, (a) calculate the 
marginal frequencies and (b) find the expected frequency for each cell in the 
contingency table. Assume that the variables are independent. 


7. 


10. 


11. 


Athlete has mi Treatment 
Result Stretched Not stretched Result Drug Placebo 
Injury 25 33 Nausea 42 33 
No injury 220 156 No nausea | 212 239 

Preference 

Students Marking System Grading System No preference 
High school 225 95 35 
Middle School 150 7 10 


Rating 


Size of restaurant Excellent Fair Poor 


Seats 100 or fewer 210 156 174 
Seats over 100 215 134 196 
Type of book 


Gender Science Fiction Crime Romance | Mythology 


Male 65 55 20 30 
Female 55 60 30 15 


SECTION 10.2 Independence 565 


Age 
Type of movie 
rented 18-24 25-34 35-44 45-64 65 and older 
Comedy 38 30 24 10 
Action 15 17 16 9 5 
Drama 12 11 19 25 13 


Using and Interpreting Concepts 


Performing a Chi-Square Independence Test Jn Exercises 13-28, 
perform the indicated chi-square independence test by performing the steps below. 


(a) 
(b) 


(c) 
(d) 
(e) 
13. 


14. 


15. 


16. 


Identify the claim and state Hy and H,. 


Determine the degrees of freedom, find the critical value, and identify the 
rejection region. 


Find the chi-square test statistic. 
Decide whether to reject or fail to reject the null hypothesis. 
Interpret the decision in the context of the original claim. 


Use the contingency table and expected frequencies from Exercise 7. 
At a = 0.05, test the hypothesis that the variables are independent. 


Use the contingency table and expected frequencies from Exercise 8. 
At a = 0.01, test the hypothesis that the variables are dependent. 


Musculoskeletal Injury The contingency table shows the results of a 
random sample of patients with pain from musculoskeletal injuries treated 
with acetaminophen or ibuprofen. At a = 0.10, can you conclude that 
the treatment is related to the result? (Adapted from American Academy of 
Pediatrics) 


Treatment 
Result Acetaminophen Ibuprofen 
Significant improvement 58 81 
Slight improvement 42 19 


Attitudes about Safety The contingency table shows the results of a 
random sample of students by type of school and their attitudes on safety 
steps taken by the school staff. At a = 0.01, can you conclude that attitudes 
about the safety steps taken by the school staff are related to the type of 
school? (Adapted from Horatio Alger Association) 


School staff has 
Taken all steps Taken some 
Type of 
ECRaGI necessary for steps toward 
student safety student safety 
Public 40 51 


Private 64 34 


566 


CHAPTER 10 Chi-Square Tests and the F-Distribution 


17. 


18. 


19. 


20. 


Trying to Quit Smoking The contingency table shows the results of a 
random sample of former smokers by the number of times they tried to 
quit smoking before they were habit-free and gender. At a = 0.05, can 
you conclude that the number of times they tried to quit before they were 
habit-free is related to gender? (Adapted from Porter Novelli HealthStyles for the 
American Lung Association) 


Number of times tried to quit before habit-free 


Gender 1 2-3 4 or more 
Male 271 257 149 
Female 146 139 80 


Achievement and School Location The contingency table shows the results 
of a random sample of students by the location of school and the number of 
those students achieving basic skill levels in three subjects. At a = 0.01, test 
the hypothesis that the variables are independent. (Adapted from HUD State 
of the Cities Report) 


Subject 


Location of school Reading Math Science 


Urban 43 42 38 
Suburban 63 66 65 


Continuing Education You work for a college’s continuing education 
department and want to determine whether the reasons given by workers 
for continuing their education are related to job type. In your study, you 
randomly collect the data shown in the contingency table. At a = 0.01, can 
you conclude that the reason and the type of worker are dependent? (Adapted 
from Market Research Institute for George Mason University) 


Reason for continuing education 


Type of worker Professional Personal Professional and personal 


Technical 30 36 41 
Other 47 25 30 


Ages and Goals You are investigating the relationship between the 
ages of U.S. adults and what aspect of career development they consider 
to be the most important. You randomly collect the data shown in the 
contingency table. At a = 0.10, is there enough evidence to conclude that 
age is related to which aspect of career development is considered to be most 
important? (Adapted from The Harris Poll) 


Career development aspect 


Age Learning new skills | Pay increases | Career path 
18-26 years 31 22 21 
27-41 years 21 31 33 


42-61 years 19 14 8 


21. 


22. 


23. 


24. 


SECTION 10.2 Independence 567 


Borrowing for College The contingency table shows a random sample of 
white, black, and Hispanic college students based on whether their family 
borrowed money to pay for their college education. At a = 0.01, can you 
conclude that borrowing money for college and race are related? (Adapted 
from Sallie Mae) 


Family borrowed money? 


Race Yes No 
White 49 64 
Black 85 123 
Hispanic 85 180 


Borrowing for College A financial aid officer is studying the relationship 
between who borrows money to pay for college in a family and the income of 
the family. As part of the study, 1593 families are randomly selected and the 
resulting data are organized as shown in the contingency table. At a = 0.01, 
can you conclude that who borrows money for college in a family is related 
to the income of the family? (Adapted from Sallie Mae) 


Who borrowed money 


Family income Student only Parent only Both Noone 
Less than $35,000 149 34 10 311 
$35,000—$100,000 181 68 58 421 
Greater than $100,000 69 40 14 238 


Vehicles and Crashes You work for an insurance company and are studying 
the relationship between types of crashes and the vehicles involved in passenger 
vehicle occupant deaths. As part of your study, you randomly select 4270 vehicle 
crashes and organize the resulting data as shown in the contingency table. 
At a = 0.05, can you conclude that the type of crash depends on the type of 
vehicle? (Adapted from Insurance Institute for Highway Safety) 


Vehicle 


Type of crash Car Pickup Sport utility 


Single-vehicle 1059 507 491 
Multiple-vehicle | 1476 354 383 


Alcohol-Related Accidents The contingency table shows the results of 
a random sample of fatally injured passenger vehicle drivers (with blood 
alcohol concentrations greater than or equal to 0.08) by age and gender. 
At a = 0.05, can you conclude that age is related to gender in such 
alcohol-related accidents? (Adapted from Insurance Institute for Highway Safety) 


Age 
Gender 16-20 21-30 31-40 41-50 51-60 61 and older 


Male 31 147 95 67 57 42 
Female 9 36 25 17 15 9 


568 CHAPTER 10. Chi-Square Tests and the F-Distribution 


Treatment 
Result Drug Placebo 
Improvement 39 25 
No change 54 70 


TABLE FOR EXERCISE 30 


25. Use the contingency table and expected frequencies from Exercise 9. 
At a = 0.01, test the hypothesis that the variables are dependent. 


26. Use the contingency table and expected frequencies from Exercise 10. 
At a = 0.05, test the hypothesis that the variables are dependent. 


27. Use the contingency table and expected frequencies from Exercise 11. 
At a = 0.05, test the hypothesis that the variables are independent. 


28. Use the contingency table and expected frequencies from Exercise 12. 
At a = 0.10, test the hypothesis that the variables are dependent. 


Extending Concepts 


Homogeneity of Proportions Test Jn Exercises 29-32, use this information 
about the homogeneity of proportions test. Another chi-square test that involves 
a contingency table is the homogeneity of proportions test. This test is used to 
determine whether several proportions are equal when samples are taken from 
different populations. Before the populations are sampled and the contingency 
table is made, the sample sizes are determined. After randomly sampling different 
populations, you can test whether the proportion of elements in a category 
is the same for each population using the same guidelines as the chi-square 
independence test. The null and alternative hypotheses are always some variation 
of these statements. 


Hy: The proportions are equal. 
H,,: At least one of the proportions is different from the others. 


Performing a homogeneity of proportions test requires that the observed 
frequencies be obtained using a random sample, and each expected frequency 
must be greater than or equal to 5. 


29. Motor Vehicle Crash Deaths The contingency table shows the results 
of a random sample of motor vehicle crash deaths by age and gender. At 
a = 0.05, perform a homogeneity of proportions test on the claim that the 
proportions of motor vehicle crash deaths involving males or females are the 
same for each age group. (Adapted from Insurance Institute for Highway Safety) 


Age 
Gender 16-24 25-34 35-44 45-54 


Male 96 98 72 80 
Female 39 33 25 29 
Age 


Gender 55-64 65-74 75-84 85 and older 


Male 74 44 25 12 
Female 26 21 16 10 


30. Obsessive-Compulsive Disorder The contingency table at the left shows 
the results of a random sample of patients with obsessive-compulsive 
disorder after being treated with a drug or with a placebo. At a = 0.10, 
perform a homogeneity of proportions test on the claim that the proportions 
of the results for drug and placebo treatments are the same. (Adapted from 
The Journal of the American Medical Association) 


SECTION 10.2 Independence 569 


31. Is the chi-square homogeneity of proportions test a left-tailed, right-tailed, 
or two-tailed test? 


32. Explain how the chi-square independence test is different from the 
chi-square homogeneity of proportions test. 


Contingency Tables and Relative Frequencies Jn Exercises 33-36, use 
the information below. 


The frequencies in a contingency table can be written as relative frequencies 
by dividing each frequency by the sample size. The contingency table below 
shows the number of U.S. adults (in millions) ages 25 and over by employment 
status and educational attainment. (Adapted from U.S. Census Bureau) 


Educational attainment 


Associate’s, 


Nota High Some bachelor’s, 
high school school college, or advanced 
Status graduate graduate no degree degree 
Employed 10.0 33.5 21.7 67.1 
Unemployed 0.9 2.1 1.1 1.9 
Not in the labor force 12.6 26.4 13.2 24.6 


33. Rewrite the contingency table using relative frequencies. 


34. Explain why you cannot perform the chi-square independence test on 
these data. 


35. What percent of U.S. adults ages 25 and over (a) have a degree and are 
unemployed and (b) have some college education, but no degree, and are 
not in the labor force? 


36. What percent of U.S. adults ages 25 and over (a) are employed and are only 
high school graduates, (b) are not in the labor force, and (c) are not high 
school graduates? 


Conditional Relative Frequencies Jn Exercises 37-42, use the contingency 
table from Exercises 33-36, and the information below. 


Relative frequencies can also be calculated based on the row totals (by 
dividing each row entry by the row’s total) or the column totals (by dividing 
each column entry by the column’s total). These frequencies are conditional 
relative frequencies and can be used to determine whether an association 
exists between two categories in a contingency table. 


37. Calculate the conditional relative frequencies in the contingency table based 
on the row totals. 


38. What percent of U.S. adults ages 25 and over who are employed have 
a degree? 


39. What percent of U.S. adults ages 25 and over who are not in the labor force 
have some college education, but no degree? 


40. Calculate the conditional relative frequencies in the contingency table based 
on the column totals. 


41. What percent of U.S. adults ages 25 and over who have a degree are not in 
the labor force? 


42. What percent of U.S. adults ages 25 and over who are not high school 
graduates are unemployed? 


Food Safety Survey 


In your opinion, how safe is the food you buy? CBS News polled 1048 U.S. adults 
and asked them the question below. 


Overall, how confident are you that the food you buy is safe to eat: very 
confident, somewhat confident, not too confident, not at all confident? 


The pie chart shows the responses to the question. You conduct a survey using 
the same question. The contingency table shows the results of your survey classified 


by gender. 
How Confident Are You That Gender 
the Food You Buy is Safe to Eat? 
Response Female Male 
Not at all confident : 
2% Very Very confident 96 160 


Not too 
confident 


confident 
32% 


Somewhat confident 232 180 
Not too confident 
Not at all confident 12 4 


56 52 


14% 


Somewhat confident 
52% 


EXERCISES 


1. 


570 


Assuming the variables gender and response 
are independent, did the number of female 
respondents or male respondents exceed the 
expected number of “very confident” responses? 


Assuming the variables gender and response 
are independent, did the number of female 
respondents or male respondents exceed the 
expected number of “somewhat confident” 
responses? 


At a = 0.01, perform a chi-square independence 
test to determine whether the variables response 
and gender are independent. What can you 
conclude? 


CHAPTER 10° Chi-Square Tests and the F-Distribution 


In Exercises 4 and 5, perform a_ chi-square 
goodness-of-fit test to compare the distribution of 
responses shown in the pie chart with the distribution 
of your survey results for each gender. Use the 
distribution shown in the pie chart as the expected 
distribution. Use a = 0.05. 


4. 


Compare the distribution of responses by females 
with the expected distribution. What can you 
conclude? 


Compare the distribution of responses by males 
with the expected distribution. What can you 
conclude? 


In addition to the variables used in the Case Study, 
what other variables do you think are important 
to consider when studying the distribution of U.S. 
consumers’ attitudes about food safety? 


SECTION 10.3 Comparing Two Variances 571 


10.3 Comparing Two Variances 


What You Should Learn The F-Distribution = The Two-Sample F-Test for Variances 


» How to interpret the 


F-distribution and use an F-table The F-Distribution 


to fine entice values In Chapter 8, you learned how to perform hypothesis tests to compare population 


>» How to perform a two-sample means and population proportions. Recall from Section 8.2 that the ¢-test for the 
er aalie o i alictce: difference between two population means depends on whether the population 
variances are equal. To determine whether the population variances are equal, 
you can perform a two-sample F-test. 
In this section, you will learn about the F-distribution and how it can be used 
to compare two variances. As you read the next definition, recall that the sample 
variance s” is the square of the sample standard deviation s. 


DEFINITION 


Let sj and s3 represent the sample variances of two different populations. 


If both populations are normal and the population variances oj and o% are 
equal, then the sampling distribution of 


is an F-distribution. Here are several properties of the F-distribution. 


The F-distribution is a family of curves, each of which is determined by two 
types of degrees of freedom: the degrees of freedom corresponding to the 
variance in the numerator, denoted by d.f.y, and the degrees of freedom 
corresponding to the variance in the denominator, denoted by d.f.p. 


The F-distribution is positively skewed and therefore the distribution is 
not symmetric (see figure below). 


The total area under each F-distribution curve is equal to 1. 
All values of F are greater than or equal to 0. 
For all F-distributions, the mean value of F is approximately equal to 1. 


A 
d.fy =1and d.fp =8 


d.f.y =8 and d.fp = 26 
Z N D 


d.fy = 16 and d.fiy =7 


d.fy =3 and d.fp =11 


4 


F-Distribution for Different Degrees of Freedom 


For unequal variances, designate the greater sample variance as sj. So, in the 
sampling distribution of F = s7/s3, the variance in the numerator is greater than 
or equal to the variance in the denominator. This means that F is always greater 
than or equal to 1. As such, all one-tailed tests are right-tailed tests, and for all 
two-tailed tests, you need only to find the right-tailed critical value. 


572 CHAPTER 10. Chi-Square Tests and the F-Distribution 


Table 7 in Appendix B lists the critical values for the F-distribution for 
selected levels of significance a and degrees of freedom d.-f.y and d-f.p. 


GUIDELINES 


Finding Critical Values for the F-Distribution 
1. Specify the level of significance a. 
2. Determine the degrees of freedom for the numerator d.f.y. 


3. Determine the degrees of freedom for the denominator d.f.p. 
4. Use Table 7 in Appendix B to find the critical value. When the hypothesis 


test is 
a. one-tailed, use the a F-table. 
b. two-tailed, use the Sa F-table. 


Note that because F is always greater than or equal to 1, all one-tailed 
tests are right-tailed tests. For two-tailed tests, you need only to find the 
right-tailed critical value. 


In Examples 1 and 2, the values of d.f.y and d.f.p are given. You will learn 
how to determine these values on page 574. 


Finding a Critical F-Value for a Right-Tailed Test 

Find the critical F-value for a right-tailed test when a = 0.10, d.f.y = 5, and 
df.p = 28. 

SOLUTION 


A portion of Table 7 is shown below. Using the a = 0.10 F-table with d.f.y = 5 
and d.f.p = 28, you can find the critical value, as shown by the highlighted 
areas in the table. 


d.f.p: a = 0.10 
Degrees of d.f..y: Degrees of freedom, numerator 
freedom, 
denominator 1 2 3 4 6 7 8 
1 39.86 49.50 53.59 55.83 58.20 58.91 59.44 
2 8.53 9.00 9.16 9.24 Css) G6) S).837/ 
) 26 2h)  Aiye eshl DAZ 2.01 1.96 1.92 
27 2.90 2.51 2.30 2.17 2.00 1.95 1.91 
2.00 1.94 1.90 
29 2.89 2.50 2.28 2.15 2.06 1.99 193 1.89 
30 Pikes) AGN) Pes) Al Boe; = {8} GB} Ast} 


From the table, you can see that the critical value is 


Fo = 2.06. Critical value 


The figure at the left shows the F-distribution for a = 0.10, df.y = 5, 
d.f.p = 28, and Fy = 2.06. 


TRY IT YOURSELF 1 


Find the critical F-value for a right-tailed test when a = 0.05, d.f.y = 8, and 
d.f.p = 20. 
Answer: Page A39 


Study Tip 


When using Table 7 in 
Appendix B to find a 
critical value, you will 
notice that some of the 
values for d.f.\y or d.f.5 

= are not included in the 
table. If the number you need for 
d.f.y or d.f.p is exactly midway 
between two values in the table, 
then use the critical value midway 
between the corresponding critical 
values. In some cases, though, it is 
easier to use technology to calculate 
the P-value, compare it to the level 
of significance, and then decide 
whether to reject the null hypothesis. 


SECTION 10.3 Comparing Two Variances 573 


When performing a two-tailed hypothesis test using the F-distribution, you 
need only to find the right-tailed critical value. You must, however, remember 
to use the 5a F-table. 


Finding a Critical F-Value for a Two-Tailed Test 


Find the critical F-value for a two-tailed test when a = 0.05, d.f.y = 4, and 


SOLUTION 

A portion of Table 7 is shown below. Using the 
1 1 
he 5 (0.05) 0.025 


F-table with d-f.y = 4 and d.f.p = 8, you can find the critical value, as shown 
by the highlighted areas in the table. 


d.f.p: a = 0.025 
Degrees of 
freedom, 
denominator 


d.f..y: Degrees of freedom, numerator 


6478 799.5 864.2 
38:51 39:00) 39:17 
1744 16.04 15.44 
1222 (Oi ORS 
10.01 843 776 

8.81 7.26 

8.07 6.54 


921.8 937.1 948.2 956.7 
39°30 39:33 39:36 39:37 
14.88 14.73 14.62 14.54 
OS CeO: 2 Oo Os O-0 
715 698 685 6.76 
DSS) wee 70) Lei) 
5.29 5.12 499 4.90 
482 465 4.53 4.43 
9 7.21 5.71 508 472 448 432 420 4.10 


NO OBWN — 


From the table, the critical value is 
Fo = 5.05. Critical value 


The figure shows the F-distribution for Sa = 0.025, d-f.y = 4, d-f.p = 8, and 


Fy = 5.05 


TRY IT YOURSELF 2 


Find the critical F-value for a two-tailed test when a = 0.01, d.f.y = 2, and 
d.f.p = 5. 
Answer: Page A39 


574 CHAPTER 10. Chi-Square Tests and the F-Distribution 


The Two-Sample F-Test for Variances 


In the remainder of this section, you will learn how to perform a two-sample 
F-test for comparing two population variances using a sample from each 
population. 


Two-Sample F-Test for Variances 


A two-sample F-test is used to compare two population variances oj and o3. 
To perform this test, these conditions must be met. 


1. The samples must be random. 

2. The samples must be independent. 

3. Each population must have a normal distribution. 
The test statistic is 


where sj and s3 represent the sample variances with st = s3. The numerator has 
d.f.4 =n, — 1 degrees of freedom and the denominator has d.f.5 = n. — 1 
degrees of freedom, where n, is the size of the sample having variance 
st and ny is the size of the sample having variance s5. 


GUIDELINES 


Using a Two-Sample F-Test to Compare oj and a3 
In Words In Symbols 


. Verify that the samples are random 
and independent, and the populations 
have normal distributions. 

. Identify the claim. State the null and State Hp and H,. 
alternative hypotheses. 

. Specify the level of significance. Identify a. 

. Identify the degrees of freedom dfiy =n, -1 
for the numerator and the denominator. df.p =n -1 

. Determine the critical value. Use Table 7 in 

Appendix B. 
. Determine the rejection region. 


. Find the test statistic and sketch 
the sampling distribution. 


. Make a decision to reject or fail If F is in the rejection 
to reject the null hypothesis. region, then reject Hp. 
Otherwise, fail to reject Hp. 


. Interpret the decision in the context 
of the original claim. 


In some cases, you will be given the sample standard deviations s; and 55. 
Remember to square both standard deviations to calculate the sample variances 
st and s3 before using a two-sample F-test to compare variances. 


LN 
Reoxh 
Merwe 


ee) Picturing 
the World 


Does location have an effect on 
the variance of real estate selling 
prices? A random sample of 
selling prices (in thousands of 
dollars) of existing homes sold 
in the California counties of Los 
Angeles and San Diego is shown 
in the table. (Adapted from California 
Association of Realtors) 


Los Angeles San Diego 


440 634 
342 378 
494 652 
598 659 
590 695 
643 776 
252 425 
447 594 
580 645 
361 546 


Assuming each population 

of selling prices is normally 
distributed, is it possible to use a 
two-sample F-test to compare the 
population variances? 


Normal | Treated 
solution solution 
n= 25 n = 20 
s?= 180 s? = 56 


SECTION 10.3 Comparing Two Variances 575 


Performing a Two-Sample F-Test 


A restaurant manager is designing a system that is intended to decrease the 
variance of the time customers wait before their meals are served. Under the 
old system, a random sample of 10 customers had a variance of 400. Under 
the new system, a random sample of 21 customers had a variance of 256. At 
a = 0.10, is there enough evidence to convince the manager to switch to the 
new system? Assume both populations are normally distributed. 


SOLUTION 

Because 400 > 256, sj = 400 and s} = 256. Therefore, sj and a7 represent 
the sample and population variances for the old system, respectively. With the 
claim “the variance of the waiting times under the new system is less than the 
variance of the waiting times under the old system,” the null and alternative 


hypotheses are 
Hy ot = oF and Ho} > 0%. (Claim) 


Note that the test is a right-tailed test with a = 0.10, and the degrees of 
freedom are 


diy =m, —-1=10-1 


I 
Ne} 


and 
df.p = ny -— 1 = 21 —- 1 = 20. 


So, the critical value is Fy = 1.96 and the rejection region is F > 1.96. The test 
statistic is 


2, 
st 400 

F= >= ~ 1.56. 
s 256 


The figure shows the location of the rejection region and the test statistic F. 
Because F is not in the rejection region, you fail to reject the null hypothesis. 


A 


a =0.10 


F=156 F,=1.96 


Interpretation There is not enough evidence at the 10% level of significance 
to convince the manager to switch to the new system. 


TRY IT YOURSELF 3 


A medical researcher claims that a specially treated intravenous solution 
decreases the variance of the time required for nutrients to enter the 
bloodstream. Independent samples from each type of solution are randomly 
selected, and the results are shown in the table at the left. At a = 0.01, is there 
enough evidence to support the researcher’s claim? Assume the populations 


are normally distributed. 
Answer: Page A39 


576 CHAPTER 10. Chi-Square Tests and the F-Distribution 


Stock A 
ny = 30 
So = 3.5 


Location A Location B 


n= 16 
s = 0.95 


Stock B 
ny = 31 
3, = 5.7 


n = 22 
s = 0.78 


Using Technology for a Two-Sample F-Test 


You want to purchase stock in a company and are deciding between two 
different stocks. Because a stock’s risk can be associated with the standard 
deviation of its daily closing prices, you randomly select samples of the 
daily closing prices for each stock to obtain the results shown at the left. At 
a = 0.05, can you conclude that one of the two stocks is a riskier investment? 
Assume the stock closing prices are normally distributed. 


SOLUTION 


Because 5.7* > 3.57, st = 5.7” and s3 = 3.5%. Therefore, st and of represent 
the sample and population variances for Stock B, respectively. With the 
claim “one of the two stocks is a riskier investment,” the null and alternative 
hypotheses are 


Aga =o and H,: 0% # 0%. (Claim) 


Note that the test is a two-tailed test with 4a = }(0.05) = 0.025, and 
the degrees of freedom are dfy=n, -1=31-—1=30 and 
d.f.p = m2 — 1 = 30 — 1 = 29. So, the critical value is Fy = 2.09 and the 
rejection region is F > 2.09. 


To perform a two-sample F-test using a TI-84 Plus, begin with the STAT 
keystroke. Choose the TESTS menu and select F:2—SampFTest. Then set up 
the two-sample F-test as shown in the first screen below. Because you are 
entering the descriptive statistics, select the Stats input option. When entering 
the original data, select the Data input option. The other displays below show 
the results of selecting Calculate or Draw. 


TI-84 PLUS TI-84 PLUS 


TI-84 PLUS 


2-SampF Test 
Inpt: Data 0,405 
n1:31 p=.01021 72459 
Sere. 5) Sx,=5.7 
n2:30 6-35 Fae.ofee F=.0i0e 
o 1: <2 >o2 vn,=31 


Calculate Draw 


The test statistic F ~ 2.65 is in the rejection region, so you reject the null 
hypothesis. 


Interpretation There is enough evidence at the 5% level of significance to 
support the claim that one of the two stocks is a riskier investment. 


TRY IT YOURSELF 4 


A biologist claims that the pH levels of the soil in two geographic locations 
have equal standard deviations. Independent samples from each location are 
randomly selected, and the results are shown at the left. At a = 0.01, is there 
enough evidence to reject the biologist’s claim? Assume the pH levels are 


normally distributed. 
Answer: Page A39 


You can also use a P-value to perform a two-sample F-test. For instance, in 
Example 4, note that the TI-84 Plus displays P = .0102172459. Because P < a, 
you reject the null hypothesis. 


SECTION 10.3 Comparing Two Variances 577 


10.3 EXERCISES rene hen sh es 


Building Basic Skills and Vocabulary 
1. Explain how to find the critical value for an F-test. 
2. List five properties of the F-distribution. 


3. List the three conditions that must be met in order to use a two-sample 
F-test. 


4. Explain how to determine the values of d.f.y and d.f.p when performing a 
two-sample F-test. 


Finding a Critical F-Value for a Right-Tailed Test Jn Exercises 5-8, 
find the critical F-value for a right-tailed test using the level of significance a and 
degrees of freedom d.f.y and d.f.p. 


5. a = 0.05, diy = 9, d.f£p = 16 G. @= 001, dig =24i5 = 11 
10S 0dbedin=— itedi, = 15 $a = 0025 ding =—7,dip =—3 
Finding a Critical F-Value for a Two-Tailed Test Jn Exercises 9-12, 


find the critical F-value for a two-tailed test using the level of significance a and 
degrees of freedom d.f.y and d.f.p. 


9. a = 0.01, d-f£.y = 6,d-f£p = 7 10. a = 0.10, dfn = 24, df.p = 28 
11. a = 0.05, df.y = 60, d-f.p = 40 12. a = 0.05, d.f.y = 27, d-f.p = 19 
In Exercises 13-18, test the claim about the difference between two population 


variances oj and o3 at the level of significance a. Assume the samples are random 
and independent, and the populations are normally distributed. 


13. Claim: oj > 03; a = 0.10. 14. Claim: 07 = 03; a = 0.05. 
Sample statistics: sj = 773, Sample statistics: sj = 310, 
ny = 5 and s3 = 765, ny = 6 n, = 7 ands} = 297, n, = 8 
15. Claim: of = 04; a = 0.01. 16. Claim: of # 04; a = 0.05. 
Sample statistics: sj = 842, Sample statistics: sj = 245, 
n, = 11 and s3 = 836,n, = 10 n, = 31 and s3 = 112,n, = 28 
17. Claim: of = 03; a = 0.01. 18. Claim: 07 > 03; a = 0.05. 
Sample statistics: sj = 9.8, Sample statistics: sj = 44.6, 
ny = 13 and sf = 2.5, ny = 20 n, = 16 and s3 = 39.3, n, = 12 


Using and Interpreting Concepts 


Performing a Two-Sample F-Test Jn Exercises 19-26, (a) identify the 
claim and state Hj and H,, (b) find the critical value and identify the rejection 
region, (c) find the test statistic F, (d) decide whether to reject or fail to reject 
the null hypothesis, and (e) interpret the decision in the context of the original 
claim. Assume the samples are random and independent, and the populations are 
normally distributed. 


19. Drunk-driving Accidents City A claims that the variance of its drunk-driving 
accidents is less than the variance of the drunk-driving accidents in City B. 
A sample of the drunk-driving cases of 15 of City A’s accidents has a variance 
of 1.9. A sample of the drunk-driving cases of 20 of City B’s accidents has a 
variance of 3.6. At a = 0.01, can you support City A’s claim? 


578 


CHAPTER 10 Chi-Square Tests and the F-Distribution 


18-34 


208 
229 
223 


35-49 


210 | 229 217 218 
213 | 245 222 256 
168 | 232 236 244 


TABLE FOR EXERCISE 21 


227 
246 
231 
248 


Golfer 1 


234 235 
223 268 
235 245 


262 
269 
258 
262 


Golfer 2 


257 258 
253 262 
265 255 


TABLE FOR EXERCISE 22 


20. 


21. 


Fuel Consumption An automobile manufacturer claims that the variance 
of the fuel consumptions for its hybrid vehicles is less than the variance of 
the fuel consumptions for the hybrid vehicles of a top competitor. A sample 
of the fuel consumptions of 19 of the manufacturer’s hybrids has a variance 
of 0.21. A sample of the fuel consumptions of 21 of its competitor’s hybrids 
has a variance of 0.34. At a = 0.10, can you support the manufacturer’s 
claim? (Adapted from GreenHybrid) 


Heart Transplant Waiting Times The table at the left shows a sample of 
the waiting times (in days) for a heart transplant for two age groups. 
At a = 0.05, can you conclude that the variances of the waiting times 
differ between the two age groups? (Adapted from Organ Procurement and 
Transplantation Network) 


BG 22. Golf The table at the left shows a sample of the driving distances (in 


23. 


24. 


25. 


26. 


yards) for two golfers. At a = 0.10, can you conclude that the variances 
of the driving distances differ between the two golfers? 


Science Assessment Tests A state school administrator claims that the 
standard deviations of science assessment test scores for eighth-grade 
students are the same in Districts 1 and 2. A sample of 12 test scores from 
District 1 has a standard deviation of 36.8 points, and a sample of 14 test 
scores from District 2 has a standard deviation of 32.5 points. At a = 0.10, 
can you reject the administrator’s claim? (Adapted from National Center for 
Education Statistics) 


U.S. History Assessment Tests A state school administrator claims that the 
standard deviations of U.S. history assessment test scores for eighth-grade 
students are the same in Districts 1 and 2. A sample of 10 test scores from 
District 1 has a standard deviation of 30.9 points, and a sample of 13 test 
scores from District 2 has a standard deviation of 27.2 points. At a = 0.01, 
can you reject the administrator’s claim? (Adapted from National Center for 
Education Statistics) 


Annual Salaries An employment information service claims that the 
standard deviation of the annual salaries for actuaries is less in California 
than in New York. You select a sample of actuaries from each state. The 
results of each survey are shown in the figure. At a = 0.05, can you support 
the service’s claim? (Adapted from America’s Career InfoNet) 


Actuaries in Actuaries in Public relations Public relations 

New York California managers in managers in 
Louisiana Florida 

8, = $37,100 Sy = $32,400 Sy = $42,000 Sy = $36,300 

n,=41 ny = 61 n, = 24 Ny = 28 


FIGURE FOR EXERCISE 25 FIGURE FOR EXERCISE 26 


Annual Salaries An employment information service claims that the 
standard deviation of the annual salaries for public relations managers is 
less in Florida than in Louisiana. You select a sample of public relations 
managers from each state. The results of each survey are shown in the figure. 
At a = 0.05, can you support the service’s claim? (Adapted from America’s 
Career InfoNet) 


SECTION 10.3 Comparing Two Variances 579 


Extending Concepts 


Finding Left-Tailed Critical F-Values In this section, you only needed to 
calculate the right-tailed critical F-value for a two-tailed test. For other applications 
of the F-distribution, you will need to calculate the left-tailed critical F-value. To 
calculate the left-tailed critical F-value, perform the steps below. 


(1) Interchange the values for d.f.y and d.f.p. 


(2) Find the corresponding F-value in Table 7. 


(3) Calculate the reciprocal of the F-value to obtain the left-tailed critical F-value. 


In Exercises 27 and 28, find the right- and left-tailed critical F-values for a 
two-tailed test using the level of significance a and degrees of freedom d.f.xn 
and d.f.p. 


27. a = 0.05, d.f.y = 6, d.f.p = 3 28. a = 0.10, dfy = 20,df.p = 15 


Confidence Interval for a4 / a3 When sj and s3 are the variances of 
randomly selected, independent samples from normally distributed populations, 
then a confidence interval for o7 / o3 is 

st 1 or, st 41 

si Fr 03) 93 Fy 

where Fp is the right-tailed critical F-value and F,, is the left-tailed critical F-value. 


In Exercises 29 and 30, construct the confidence interval for of if a3. Assume the 
samples are random and independent, and the populations are normally distributed. 


29. Cholesterol Contents In a recent study of the cholesterol contents of 
grilled chicken sandwiches served at fast food restaurants, a nutritionist 
found that random samples of sandwiches from Restaurant A and from 
Restaurant B had the sample statistics shown in the table. Construct a 
95% confidence interval for of / a3, where oj and o% are the variances of 
the cholesterol contents of grilled chicken sandwiches from Restaurant A 
and Restaurant B, respectively. 


Cholesterol contents of 
grilled chicken sandwiches 


Restaurant A | Restaurant B 
st = 10.89 ss = 9.61 
ny = 16 ny = 12 


30. Carbohydrate Contents In a recent study of the carbohydrate contents 
of grilled chicken sandwiches served at fast food restaurants, a nutritionist 
found that random samples of sandwiches from Restaurant A and from 
Restaurant B had the sample statistics shown in the table. Construct a 
95% confidence interval for of 7 a3, where oj and a3 are the variances of 
the carbohydrate contents of grilled chicken sandwiches from Restaurant A 
and Restaurant B, respectively. 


Carbohydrate contents of 
grilled chicken sandwiches 


Restaurant A | Restaurant B 
st = 5.29 33 = 3.61 
ny = 16 ny = 12 


580 CHAPTER 10 Chi-Square Tests and the F-Distribution 


What You Should Learn 


» How to use one-way analysis of 
variance to test claims involving 
three or more means 

~ An introduction to two-way 
analysis of variance 


One-Way ANOVA = Two-Way ANOVA 


One-Way ANOVA 


Suppose a medical researcher is analyzing the effectiveness of three types of 
pain relievers and wants to determine whether there is a difference in the mean 
lengths of time it takes the three medications to provide relief. To determine 
whether such a difference exists, the researcher can use the F-distribution 
together with a technique called analysis of variance. Because one independent 
variable is being studied, the process is called one-way analysis of variance. 


DEFINITION 


One-way analysis of variance is a hypothesis-testing technique that is used 


to compare the means of three or more populations. Analysis of variance is 
usually abbreviated as ANOVA. 


To begin a one-way analysis of variance test, you should first state the null 
and alternative hypotheses. For a one-way ANOVA test, the null and alternative 
hypotheses are always similar to these statements. 


Ao: by = Po = B3 = = by (All population means are equal.) 


H,: At least one mean is different from the others. 


When you reject the null hypothesis in a one-way ANOVA test, you can 
conclude that at least one of the means is different from the others. Without 
performing more statistical tests, however, you cannot determine which of the 
means is different. 

Before performing a one-way ANOVA test, you must check that these 
conditions are satisfied. 


1. Each sample must be randomly selected from a normal, or approximately 
normal, population. 

2. The samples must be independent of each other. 

3. Each population must have the same variance. 


The test statistic for a one-way ANOVA test is the ratio of two variances: the 
variance between samples and the variance within samples. 


Variance between samples 


Test statistic = : ee 
Variance within samples 


1. The variance between samples measures the differences related to the 
treatment given to each sample. This variance, sometimes called the mean 
square between, is denoted by MSz. 

2. The variance within samples measures the differences related to entries 
within the same sample and is usually due to sampling error. This variance, 
sometimes called the mean square within, is denoted by MS. 


SECTION 10.4 Analysis of Variance 581 


One-Way Analysis of Variance Test 


To perform a one-way ANOVA test, these conditions must be met. 


1. Each of the k samples, k = 3, must be randomly selected from a normal, 
or approximately normal, population. 


2. The samples must be independent of each other. 
3. Each population must have the same variance. 


If these conditions are met, then the sampling distribution for the test is 
approximated by the F-distribution. The test statistic is 


_ MSz 

MSw 

The degrees of freedom are 

dfy=k-1 
and 

df.p=N-k 


where k is the number of samples and N is the sum of the sample sizes. 


Degrees of freedom for numerator 


Degrees of freedom for denominator 


If there is little or no difference between the means, then MS, will be 
approximately equal to MSy, and the test statistic will be approximately 1. Values 
of F close to 1 suggest that you should fail to reject the null hypothesis. However, 
if one of the means differs significantly from the others, then MS, will be greater 
than MS, and the test statistic will be greater than 1. Values of F significantly 
greater than 1 suggest that you should reject the null hypothesis. So, all one-way 
ANOVA tests are right-tailed tests. That is, if the test statistic is greater than the 
critical value, then Hp will be rejected. 


GUIDELINES 


Finding the Test Statistic for a One-Way ANOVA Test 
In Words In Symbols 


Study Tip 


The notations n;, X;, and 
s? represent the sample 


size, mean, and variance 
of the ith sample, 
respectively. Also, note 
that X is sometimes called 


. Find the mean and variance of 
each sample. 


the grand mean. 


. Find the mean of all entries in 


all samples (the grand mean). 


. Find the sum of squares between 


the samples. 


. Find the sum of squares within 


the samples. 


. Find the variance between the 


samples. 


. Find the variance within the 


samples. 


. Find the test statistic. 


_ 2% 
N 


SSR — Sn x; _ x) 


SSyw = >(n; — 1)s? 


_ SSz _ rn x; = x) 
d.f.y k-1 


MS, 


SSy > (nj _ 1)s? 
M, — — 
Sw diy N-k 
_ MSz 


~ MSy 


Note that in Step 1 of the guidelines above, you are summing the values from 
just one sample. In Step 2, you are summing the values from all of the samples. 
The sums SSz and SSw are explained on the next page. 


582 


CHAPTER 10° Chi-Square Tests and the F-Distribution 


In the guidelines for finding the test statistic for a one-way ANOVA test, the 
notation SSz represents the sum of squares between the samples. 


SSz = ny (x; _ ra + ny(X> — x) i Se ny (Xp — ag 


= n(x; — x) 


Also, the notation SSy represents the sum of squares within the samples. 


SSw = (ny - 1)s? + (ny - 1)s3 


= > (nj rae 1)s? 


GUIDELINES 


+(e = 1)55 


Performing a One-Way Analysis of Variance Test 


In Words 


. Verify that the samples are 
random and independent, the 
populations have normal 
distributions, and the population 
variances are equal. 


. Identify the claim. State the null 
and alternative hypotheses. 


. Specify the level of significance. 


. Determine the degrees of freedom 
for the numerator and the 
denominator. 


. Determine the critical value. 

. Determine the rejection region. 

. Find the test statistic and sketch 
the sampling distribution. 


. Make a decision to reject or fail 
to reject the null hypothesis. 


. Interpret the decision in the 
context of the original claim. 


In Symbols 


State Hp and H,. 
Identify a. 
df.p=N-k 


Use Table 7 in Appendix B. 


_ MSs 
MSy 


F 


If F is in the rejection 
region, then reject Hp. 
Otherwise, fail to reject Ho. 


Tables are a convenient way to summarize the results of a one-way analysis 
of variance test. ANOVA summary tables are set up as shown below. 


ANOVA Summary Table 
Sum of Degrees of Mean 
Variation squares freedom squares F 
Bet SS jee k= 4| eee |e 
etween B fy = Baty | MSy 
ae SSw 
Within SS | dip = N= | 1p = 


dp 


SECTION 10.4 Analysis of Variance 583 


Performing a One-Way ANOVA Test 


A medical researcher wants to determine whether there is a difference in the 
mean lengths of time it takes three types of pain relievers to provide relief from 
headache pain. Several headache sufferers are randomly selected and given 
one of the three medications. Each headache sufferer records the time (in 
minutes) it takes the medication to begin working. The results are shown in the 
table. At a = 0.01, can you conclude that at least one mean time is different 
from the others? Assume that each population of relief times is normally 
distributed and that the population variances are equal. 


Medication1 Medication2 Medication 3 


12 16 14 

15 14 17 

17 21 20 

12 15 15 

19 
n=4 ny = 5 n3=4 
m= P=14 5 =8=17 = 8 = 165 

st = 6 53 = 8.5 s=7 


SOLUTION 
The null and alternative hypotheses are as follows. 
Aly: fy = M2 = M3 


H,; At least one mean is different from the others. (Claim) 


Because there are k = 3 samples, d.f.y = k — 1 = 3 — 1 = 2. The sum of the 
sample sizes is N = n, + np +n3 =4+5+4+4= 13.So, 


dip =N-k=13-3 = 10. 


Using df.y = 2, dfp = 10, and a = 0.01, the critical value is Fy = 7.56. 
The rejection region is F > 7.56. To find the test statistic, first calculate x, 
MSz, and MSw. 


Xx 56+ 85 + 66 


z=" rc ~ 15.92 
SSg _ &nj(x; — x)? 
ee ae ee 
_ 4(14 — 15.92)? + 5(17 — 15.92)? + 4(16.5 — 15.92)? 
- 3-1 
_ 21.9232 
~ 3 
= 10.9616 
SS. S(np— 1)s? 
Te ae 
_ (4—1)(6) + (5 — 1)(8.5) + (4 - 1)(7) 
7 13-3 
_ 73 
~ 10 


= 7.3 


584 
POON 


Oey y... 
eee) Picturing 


the World 


A researcher wants to determine 
whether there is a difference in 

the mean lengths of time wasted 
at work for people in California, 
Georgia, and Pennsylvania. Several 
people from each state who work 
8-hour days are randomly selected 
and they are asked how much time 
(in hours) they waste at work each 
day. The results are shown in the 
table. (Adapted from Salary.com) 


CHAPTER 10 Chi-Square Tests and the F-Distribution 


Using MS, ~ 10.9616 
MSy = 7.3, the test statistic is 
_ MSz 
~ MSy 
10.9616 

13 
= 1.50. 


and 


F 


The figure shows the location of 
the rejection region and the test 
statistic F. Because F is not in the 
rejection region, you fail to reject 
the null hypothesis. 


Interpretation There is not 
enough evidence at the 1% level 
of significance to conclude that 
there is a difference in the mean 
length of time it takes the three 
pain relievers to provide relief from 


CA GA PA 

2 2 1.75 
1.75 | 2.5 3 
2.5 | 1.25 | 2.75 

3 2.25 2 
2.75 | 1.5 3 
3.25 3 2.5 
1.25 | 2.75 | 2.75 

2 2.25 | 3.25 
2.5 2 3 
1.75 1 259 
1.5 2.25 
2d) 


At a = 0.10, can the researcher 
conclude that there is a difference 
in the mean lengths of time 
wasted at work among the states? 
Assume that each population is 
normally distributed and that the 
population variances are equal. 


headache pain. 


The ANOVA summary table for Example 1 is shown below. 


Sum of 
Variation | squares 
Between = 21.9232 
Within 73 


TRY IT YOURSELF 1 


Degrees of 
freedom 


2 
10 


Mean 
squares F 
10.9616 1.50 
TS 


A sales analyst wants to determine whether there is a difference in the mean 
monthly sales of a company’s four sales regions. Several salespersons from 
each region are randomly selected and they provide their sales amounts (in 
thousands of dollars) for the previous month. The results are shown in the 
table. At a = 0.05, can the analyst conclude that there is a difference in the 
mean monthly sales among the sales regions? Assume that each population of 
sales is normally distributed and that the population variances are equal. 


North East South West 
34 47 40 21 
28 36 30 30 
18 30 41 24 
24 38 29 37 
44 23 
ny =4 Ny =5 ny = 4 ng = 5 
x4 =26 |» =39 x, =35 X4 = 27 
st ~ 45.33 83 = 45 | 53 ~ 40.67. 53 = 42.5 


Answer: Page A39 


Using technology greatly simplifies the one-way ANOVA process. When 
using technology such as Minitab, Excel, StatCrunch, or the TI-84 Plus to 
perform a one-way analysis of variance test, you can use P-values to decide 
whether to reject the null hypothesis. If the P-value is less than a, then reject Hp. 


Compact Midsize Large 


12 21 18 
23 23 17 
17 19 14 
20 14 17 
25 14 20 
18 21 17 
24 26 17 
27 18 13 
29 25 
17 16 
SL 27 
24 22 
24 21 

25 

21 


Tech Tip 


Here are instructions 
for performing a 
one-way analysis of 
variance test on a 
TI-84 Plus. Begin by 
storing the data in L1, 
L2, and so on. 


STAT 


Choose the TESTS menu. 
H: ANOVA( 


Then enter L1, L2, and so on, 
separated by commas. 


SECTION 10.4 Analysis of Variance 585 


Using Technology to Perform a One-Way ANOVA Test 


A researcher believes that for city driving, the fuel economy of compact, 
midsize, and large cars are the same. The gas mileages (in miles per gallon) for 
city driving for several randomly selected cars from each category are shown in 
the table at the left. Assume that the populations are normally distributed, the 
samples are independent, and the population variances are equal. At a = 0.05, 
can you reject the claim that the mean gas mileages for city driving are the 
same for the three categories? Use technology to test the claim. (Source: 
Fueleconomy.gov) 


SOLUTION 
Here are the null and alternative hypotheses. 
Aly: fy = M2 = bz (Claim) 
H,; At least one mean is different from the others. 


The results obtained by performing the test using Excel are shown below. 
From the results, you can see that P ~ 0.02. Because P < a, you reject the 
null hypothesis. 


Anova: Single factor) 


SUMMARY 


Groups Count |Sum| Average | Variance | 
| Compact 13, 291 | 22.38462) 28.75641 | 
midsize 15, 313 | 20.86667) 16.69524 | 


large 16.625) 4.839286 


ANOVA 

Source of Variation SS MS F P-value __)| 
Between Groups _ 168.287) 2 | 84.14348) 4.5382074)|0.018236 
Within Groups (612.6853) 33) 18.56622) 


Total (780.9722) 35) 


Interpretation There is enough evidence at the 5% level of significance to 
reject the claim that the mean gas mileages for city driving are the same. 


TRY IT YOURSELF 2 
The data shown in the table represent the GPAs of randomly selected freshmen, 
sophomores, juniors, and seniors. At a = 0.05, can you conclude that there is 
a difference in the means of the GPAs? Assume that the populations of GPAs 
are normally distributed and that the population variances are equal. Use 
technology to test the claim. 

Freshmen 2.34 | 2.38 3.31 | 2.39 3.40 2.70 | 2.34 

Sophomores 3.26 | 2.22 | 3.26 3.29 2.95 | 3.01 3.13 | 3.59 | 2.84 | 3.00 

Juniors 2.80 2.60 2.49 2.83 2.34 3.23 3.49 3.03 2.87 

Seniors 3.31 | 2.35 | 3.27 | 2.86 | 2.78 | 2.75 | 3.05 | 3.31 


Answer: Page A39 


586 CHAPTER 10. Chi-Square Tests and the F-Distribution 


Two-Way ANOVA 


When you want to test the effect of two independent variables, or factors, on 
one dependent variable, you can use a two-way analysis of variance test. For 
instance, suppose a medical researcher wants to test the effect of gender and type 
of medication on the mean length of time it takes pain relievers to provide relief. 
To perform such an experiment, the researcher can use the two-way ANOVA 
block design shown below. 


Gender 
M F 

¢ J Males taking | Females taking 
2 type I type I 
9 Males taking Females taking 
— I 
s type II type II 
o 
= 
e WW Males taking | Females taking 

type III type III 


A two-way ANOVA test has three null hypotheses—one for each main 
effect and one for the interaction effect. A main effect is the effect of one 
independent variable on the dependent variable, and the interaction effect is the 
effect of both independent variables on the dependent variable. For instance, the 
hypotheses for the pain reliever experiment are listed below. 


Hypotheses for main effects: 


Study Tip 
If gender and type of 
medication have no effect 
on the length of time it 
takes a pain reliever to 
provide relief, then there 
» will be no significant 
difference in the means of the relief 
times. 


Ho: Gender has no effect on the mean length of time it takes a pain reliever 
to provide relief. 


H,: Gender has an effect on the mean length of time it takes a pain reliever 
to provide relief. 


Ho: The type of medication has no effect on the mean length of time it takes 
a pain reliever to provide relief. 


H,: The type of medication has an effect on the mean length of time it takes 
a pain reliever to provide relief. 


Hypotheses for interaction effect: 


Hp: There is no interaction effect between gender and type of medication on 
the mean length of time it takes a pain reliever to provide relief. 


H,: There is an interaction effect between gender and type of medication on 
the mean length of time it takes a pain reliever to provide relief. 


To test these hypotheses, you can perform a two-way ANOVA test. Note 
that the conditions for a two-way ANOVA test are the same as those for a 
one-way ANOVA test with the additional condition that all samples must be of 
equal size. Using the F-distribution, a two-way ANOVA test calculates an F-test 
statistic for each hypothesis. As a result, it is possible to reject none, one, two, or 
all of the null hypotheses. 

The statistics involved with a two-way ANOVA test is beyond the scope 
of this course. You can, however, use technology such as Minitab to perform a 
two-way ANOVA test. 


10.4 EXERCISES 


SECTION 10.4 Analysis of Variance 587 


For Extra Help: MyLab Statistics 


Building Basic Skills and Vocabulary 


1. 
2. 
3. 


State the null and alternative hypotheses for a one-way ANOVA test. 
What conditions are necessary in order to use a one-way ANOVA test? 


Describe the difference between the variance between samples MSz and the 
variance within samples MSw. 


. Describe the hypotheses for a two-way ANOVA test. 


Using and Interpreting Concepts 


Performing a One-Way ANOVA Test Jn Exercises 5-14, (a) identify 
the claim and state Hj) and H,, (b) find the critical value and identify the 
rejection region, (c) find the test statistic F, (d) decide whether to reject or fail 
to reject the null hypothesis, and (e) interpret the decision in the context of the 
original claim. Assume the samples are random and independent, the populations 
are normally distributed, and the population variances are equal. 


5. 


Toothpaste The table shows the costs per ounce (in dollars) for a sample of 
toothpastes exhibiting very good stain removal, good stain removal, and fair 
stain removal. At a = 0.05, can you conclude that at least one mean cost per 
ounce is different from the others? (Source: Consumer Reports) 


Very good 0.47 | 0.49 | 0.41 | 0.37. 0.48 0.51 
Good 0.60 0.64 0.58 0.75 | 0.46 
Fair 0.34 0.46 0.44 | 0.60 


. Automobile Batteries The table shows the prices (in dollars) for a sample 


of automobile batteries. The prices are classified according to battery type. 
At a = 0.05, is there enough evidence to conclude that at least one mean 
battery price is different from the others? (Adapted from Consumer Reports) 


Group size 35 110 100) 125, «90 | 120 
Group size 65 280 | 145 180 175 90 
Group size 24/24F 140 125 85 | 140 | 80 


. Vacuum Cleaners The table shows the weights (in pounds) for a sample of 


vacuum cleaners. The weights are classified according to vacuum cleaner type. 
At a = 0.01, can you conclude that at least one mean vacuum cleaner weight 
is different from the others? (Source: Consumer Reports) 


Bagged upright 21 22 23 21 = 17) 19 
Bagless upright 16 18 19 18 17 20 
Bagged canister 26 24 23 25 27° 21 


588 CHAPTER 10 Chi-Square Tests and the F-Distribution 


BG 8. Government Salaries The table shows the salaries (in thousands of 
dollars) for a sample of individuals from the federal, state, and local 
levels of government. At a = 0.01, can you conclude that at least one 
mean salary is different from the others? (Adapted from Bureau of Labor 
Statistics) 


Federal State Local 


75.2 57.9 52.5 
67.9 42.0 42.0 
79.3 59.0 46.3 
87.1 59.5 44.7 
86.4 61.7 55.3 
90.5 66.8 49.4 
61.1 44.9 64.0 
76.0 55.4 44.5 
85.7 58.6 40.9 
69.4 52.4 37.1 


BG 9. Ages of Professional Athletes The table shows the ages (in years) for 
a sample of professional athletes from several sports. At a = 0.05, can 
you conclude that at least one mean age is different from the 
others? (Source: ESPN) 


MLB NBA_ NFL | NHL 


30 28 26 29 
25 27 28 23 
26 29 27 26 
31 30 26 30 
27 24 29 27 
29 27 27 25 
27 28 26 24 
25 33 26 26 
27 26 27 29 
23 28 Zh 32 
26 27 29 28 
34 28 25 25 
29 26 24 27 


eB 10. Cost Per Mile The table shows the costs per mile (in cents) for a 
sample of automobiles. At a = 0.01, can you conclude that at least one 
mean cost per mile is different from the others? (Adapted from American 
Automobile Association) 


Small sedan Mediumsedan Largesedan SUV4WD_ = Minivan 


41 65 60 79 64 
39 47 69 58 74 
47 61 79 67 57 
52 57 71 70 49 
44 62 76 68 


50 68 


eB 11. 


Q 
5 


Well-Being Index The 
well-being index is a way 
to measure how people 
are faring physically, 
emotionally, socially, and 
professionally, as well as 
to rate the overall quality 
of their lives and their 
outlooks for the future. The 
table shows the well-being 
index scores for a sample 
of states from four regions 
of the United States. At 
a = 0.10, can you reject 
the claim that the mean 
score is the same for all 
regions? (Adapted from 
Gallup and Healthways) 


. Days Spent at the Hospital 


SECTION 10.4 Analysis of Variance 


Northeast Midwest South 
61.7 61.6 61.0 
63.6 61.8 60.8 
62.6 61.4 63.1 
61.8 63.2 62.3 
62.1 61.7 60.5 
63.5 62.9 62.0 

62.9 61.3 
63.7 60.5 
62.3 
61.5 
63.1 


589 


West 
64.0 
63.0 
63.5 
63.2 
61.8 
62.6 
62.5 
62.8 
62.5 


In a recent study, a health insurance 


company investigated the number of days patients spent at the hospital. 
In part of the study, the company selected a sample of patients from 
four regions of the United States and recorded the number of days 
each patient spent at the hospital. The table shows the results of the 
study. At a = 0.01, can the company reject the claim that the mean 
number of days patients spend at the hospital is the same for all four 
regions? (Adapted from National Center for Health Statistics) 


Northeast Midwest South West 


6 6 3 
4 6 5 
7 7 6 
2 3 6 
3 5 3 
4 4 7 
6 4 4 
8 3 

9 2 


3 


Yann ft oO FA 


. Personal Income The table shows the salaries of a sample of 


individuals from six large metropolitan areas. At a = 0.05, can 
you conclude that the mean salary is different in at least one of the 
areas? (Adapted from U.S. Bureau of Economic Analysis) 


Chicago Dallas Miami 
48,581 42,524 | 49,357 
42,731 39,709 | 53,207 
51,831 46,209 | 40,557 
58,031 57,704 | 52,357 
57,551 46,909 | 44,907 
47,131 59,259 | 48,757 

53,269 | 53,557 


Denver 
48,790 
49,970 
53,990 
57,290 
60,565 
51,390 


San Diego Seattle 


53,370 
50,470 
48,920 
59,670 
46,770 


63,678 
54,043 
51,943 
58,543 
63,418 


590 


CHAPTER 10 Chi-Square Tests and the F-Distribution 


BG 14. Housing Prices The table shows the sale prices (in thousands of 
dollars) of a sample of one-family houses in three cities. At a = 0.10, 
can you conclude that at least one mean sale price is different from the 
others? (Adapted from National Association of Realtors) 


Gainesville Orlando Tampa 


173.0 243.9 230.7 
145.5 201.1 115.7 
190.6 185.3 211.0 
186.3 187.5 203.5 
248.7 207.9 149.9 
206.4 234.8 166.8 

86.8 253.2 134.1 
204.6 144.7 214.2 
174.5 163.9 105.5 
220.0 173.3 216.2 
173.0 


Extending Concepts 


Using Technology to Perform a Two-Way ANOVA Test Jn Exercises 
15-18, use technology and the block design to perform a two-way ANOVA test. Use 
a = 0.10. Interpret the results. Assume the samples are random and independent, 
the populations are normally distributed, and the population variances are equal. 


Be 15. Advertising A study was conducted in which a sample of 20 adults 
was asked to rate the effectiveness of advertisements. Each adult 
rated a radio or television advertisement that lasted 30 or 60 seconds. 
The block design shows these ratings (on a scale of 1 to 5, with 5 being 
extremely effective). 


Advertising medium 


Radio Television 
3 
= 30 sec D353, il, 3 3,5,4,1,2 
= 
$b 
5 60sec 4225 D5) 3) 4) al 
4 


eB 16. Vehicle Sales The owner of a car dealership wants to determine 
whether the gender of a salesperson and the type of vehicle sold affect 
the number of vehicles sold in a month. The block design shows the 
numbers of vehicles, listed by type, sold in a month by a sample of 
eight salespeople. 


Type of vehicle 
Car Truck Van/SUV 
Male 6,5,4,5 22 ies 4,3,4,2 


Gender 


Female 5, 1, 7 10M a (0), iL 


SECTION 10.4 Analysis of Variance 591 


7 17. Grade Point Average A study was conducted in which a sample of 
24 high school students was asked to give their grade point average 
(GPA). The block design shows the GPAs of male and female students 
from four different age groups. 


Age 
15 16 17 18 
5 Male Da) Polly BH 4.0, 1.4, 2.0 So 22,2.0 3.1, 0.7, 2.8 
: 
© Female 4.0, 2.1, 1.9 25, SHO), Pall 4.0, 2.2, 1.7 110, Aes), 30 


Be 18. Laptop Repairs The manager of a computer repair service wants 
to determine whether there is a difference in the time it takes four 
technicians to repair different brands of laptops. The block design 
shows the times (in minutes) it took for each technician to repair three 
laptops of each brand. 


Technician 
Technicianl Technician2 Technician3 Technician 4 
Brand A 67, 82, 64 42, 56,39 69, 47, 38 70, 44, 50 
z 
= Brand B 44, 62,55 47,58, 62 55, 45, 66 47,29, 40 
OQ 
Brand C 47, 36, 68 39, 74,51 74, 80, 70 45, 62,59 


The Scheffé Test /f the null hypothesis is rejected in a one-way ANOVA test 
of three or more means, then a Scheffé Test can be performed to find which means 
have a significant difference. In a Scheffé Test, the means are compared two at a 
time. For instance, with three means you would have these comparisons: X, versus 
X, X1 versus X3, and X, versus X3. For each comparison, calculate 


(Xa ~ tp 


Sw ( 1 7 ) 
+ 
> (nj = 1) Ng Np 
where x, and X,, are the means being compared and n, and n, are the corresponding 
sample sizes. Calculate the critical value by multiplying the critical value of the 
one-way ANOVA test by k — 1. Then compare the value that is calculated using 


the formula above with the critical value. The means have a significant difference 
when the value calculated using the formula above is greater than the critical value. 


Use the information above to solve Exercises 19-22. 


19. Refer to the data in Exercise 5. At a = 0.05, perform a Scheffé Test to 
determine which means have a significant difference. 

20. Refer to the data in Exercise 7. At a = 0.01, perform a Scheffé Test to 
determine which means have a significant difference. 

21. Refer to the data in Exercise 8. At a = 0.01, perform a Scheffé Test to 
determine which means have a significant difference. 

22. Refer to the data in Exercise 11. At a = 0.10, perform a Scheffé Test to 


determine which means have a significant difference. 


USES AND | Statistics in the Real World 


ABUSES 


One-Way Analysis of Variance (ANOVA) ANOVA can help you make 
important decisions about the allocation of resources. For instance, suppose 
you work for a large manufacturing company and part of your responsibility is 
to determine the distribution of the company’s sales throughout the world and 
decide where to focus the company’s efforts. Because wrong decisions will cost 
your company money, you want to make sure that you make the right decisions. 


Preconceived Notions ‘There are several ways that the tests presented in this 
chapter can be abused. For instance, it is easy to allow preconceived notions 
to affect the results of a chi-square goodness-of-fit test and a chi-square 
independence test. When testing to see whether a distribution has changed, 
do not let the existing distribution “cloud” the study results. Similarly, when 
determining whether two variables are independent, do not let your intuition 
“get in the way.” As with any hypothesis test, you must properly gather 
appropriate data and perform the corresponding test before you can reach a 
logical conclusion. 


Incorrect Interpretation of Rejection of Null Hypothesis It is important to 
remember that when you reject the null hypothesis of an ANOVA test, you are 
simply stating that you have enough evidence to determine that at least one of 
the population means is different from the others. You are not finding them all to 
be different. One way to further test which of the population means differs from 
the others is explained in Extending Concepts in Section 10.4 Exercises. 


1. Preconceived Notions ANOVA depends on having independent variables. 
Describe an abuse that might occur by having dependent variables. Then 
describe how the abuse could be avoided. 


2. Incorrect Interpretation of Rejection of Null Hypothesis Find an example 
of the use of ANOVA. In that use, describe what would be meant by 
“rejection of the null hypothesis.” How should rejection of the null hypothesis 
be correctly interpreted? 


592 CHAPTER 10. Chi-Square Tests and the F-Distribution 


Chapter Summary 593 


10 Chapter Summary 


Review 
What Did You Learn? Example(s) Exercises 
Section 10.1 
» How to use the chi-square distribution to test whether a frequency 1-3 1-4 
distribution fits an expected distribution 
(OE) 
2 — 
Xo = 
Section 10.2 
» How to use a contingency table to find expected frequencies 1 5-8 
ee (Sum of row r) + (Sum of column c) 
oe Sample size 
» How to use a chi-square distribution to test whether two variables are 2,3 5-8 
independent 
Section 10.3 
» How to interpret the F-distribution and use an F-table to find critical values lene 9-16 
sj 
PS 
$9 
» How to perform a two-sample F-test to compare two variances 3,4 17-20 
Section 10.4 
» How to use one-way analysis of variance to test claims involving three or 1,2 21, 22 
more means 
MS, 
F=— 3 
MSyw 


594 CHAPTER 10. Chi-Square Tests and the F-Distribution 


Less than $10 
29% 


$10 to $20 
16% 


Don’t give 
one/other More than $21 
46% 9% 


FIGURE FOR EXERCISE 2 


Approach and swing 
22% 


Driver 
shots 
pen) 9% 


Putting 
Short-game shots 4% 
65% 


FIGURE FOR EXERCISE 3 


10 Review Exercises 


Section 10.1 


In Exercises 1-4, (a) identify the claim and state Hy and H,, (b) find the critical 
value and identify the rejection region, (c) find the chi-square test statistic, 
(d) decide whether to reject or fail to reject the null hypothesis, and (e) interpret 
the decision in the context of the original claim. 


1. A researcher claims that the distribution of the lengths of visits at physician 
offices is different from the distribution shown in the pie chart. You 
randomly select 400 people and ask them how long their office visits with a 
physician were. The table shows the results. At a = 0.01, test the researcher’s 
claim. (Adapted from Medscape) 


less hae 9 Survey results 


Minutes Frequency, f 


less than 9 20 
10-12 80 
13-16 113 
17-20 91 
21-24 40 
25 or more 56 


2. A researcher claims that the distribution of the amounts that parents give 
for an allowance is different from the distribution shown in the pie chart. 
You randomly select 1103 parents and ask them how much they give for an 
allowance. The table shows the results. At a = 0.10, test the researcher’s 
claim. (Adapted from Echo Research) 


Survey results 


Response Frequency, f 
Less than $10 353 
$10 to $20 167 
More than $21 94 
Don’t give one/other 489 


3. A sports magazine claims that the opinions of golf students about what they 
need the most help with in golf are distributed as shown in the pie chart. You 
randomly select 435 golf students and ask them what they need the most 
help with in golf. The table shows the results. At a = 0.05, test the sports 
magazine’s claim. (Adapted from PGA of America) 


Survey results 


Response Frequency, f 
Short-game shots 276 
Approach and swing 99 
Driver shots 42 
Putting 18 


Review Exercises 595 


4. An education researcher claims that the 
charges for tuition, fees, room, and board 
at 4-year degree-granting postsecondary $15,000-$17,499 138 
institutions are uniformly distributed. 
To test this claim, you randomly select 
800 4-year degree-granting postsecondary $20,000-$22,499 246 
institutions and determine the charges for 
tuition, fees, room, and board at each. The $22,500-$24,499 169 
table shows the results. At a = 0.05, test $25,000 or more 93 
the education researcher’s claim. (Adapted 
from National Center for Education Statistics) 


Section 10.2 


In Exercises 5-8, (a) find the expected frequency for each cell in the contingency 
table, (b) identify the claim and state Hy and H,, (c) determine the degrees of 
freedom, find the critical value, and identify the rejection region, (d) find the 
chi-square test statistic, (e) decide whether to reject or fail to reject the null 
hypothesis, and (f) interpret the decision in the context of the original claim. 


Cost Frequency, f 


$17,500-$19,999 154 


5. The contingency table shows the results of a random sample of public 
elementary and secondary school teachers by gender and years of full-time 
teaching experience. At a = 0.01, can you conclude that gender is related to 
the years of full-time teaching experience? (Adapted from U.S. National Center 
for Education Statistics) 


Years of full-time teaching experience 


Gender’ Less than3 years 3-9 years 10-20 years 20 years or more 


Male 102 339 402 207 
Female 216 825 876 533 


6. The contingency table shows the results of a random sample of individuals by 
gender and type of vehicle owned. At a = 0.01, can you conclude that gender 
is related to the type of vehicle owned? 


Type of vehicle owned 


Gender Car Truck SUV Van 


Male 85 95 44 8 
Female 110 73 61 4 


7. The contingency table shows the results of a random sample of endangered 
and threatened species by status and vertebrate group. At a = 0.05, test the 
hypothesis that the variables are independent. (Adapted from U.S. Fish and 
Wildlife Service) 


Vertebrate group 


Status Mammals Birds Reptiles Amphibians Fish 


Endangered 151 137 37 18 50 
Threatened 23 17 22 12 33 


596 


CHAPTER 10 Chi-Square Tests and the F-Distribution 


8. The contingency table shows the distribution of a random sample of fatal 
pedestrian motor vehicle collisions by time of day and gender in a recent 
year. At a = 0.10, can you conclude that time of day and gender are 
related? (Adapted from National Highway Traffic Safety Administration) 


Time of day 
12 a.M.— 6 A.M.— 12 p.m.— 6 P.M.— 
Gender 5:59 a.m. 11:59 a.m. 5:59 p.m. 11:59 P.M. 
Male 611 595 884 911 
Female 260 354 563 552 


Section 10.3 


In Exercises 9-12, find the critical F-value for a right-tailed test using the level of 
significance a and degrees of freedom d.f.y and d.f.p. 


9. a = 0.05, dfn = 6,dfp = 50 10. a = 0.01, diy 
11. a = 0.10, df.y = 5,dfp = 12 12. a = 0.05, dfn 


10 
25 


12, d.f.p 
20, d.f.p 


In Exercises 13-16, find the critical F-value for a two-tailed test using the level of 
significance a and degrees of freedom d.f.y and d.f.p. 


13. a = 0.10,df.y = 15,d.f£p = 27 14. a = 0.05, dfn 
15. a = 0.01, d-f.y = 40, d.f.p = 60 16. a = 0.01, d-f.y 


9,df.p = 8 
11,dfp = 13 


In Exercises 17-20, (a) identify the claim and state Hy and H,, (b) find the critical 
value and identify the rejection region, (c) find the test statistic F, (d) decide whether 
to reject or fail to reject the null hypothesis, and (e) interpret the decision in the 
context of the original claim. Assume the samples are random and independent, 
and the populations are normally distributed. 


17. A travel consultant claims that the standard deviations of hotel room rates 
for San Francisco, CA, and Sacramento, CA, are the same. A sample of 
36 hotel room rates in San Francisco has a standard deviation of $75 and 
a sample of 31 hotel room rates in Sacramento has a standard deviation 
of $44. At a = 0.01, can you reject the travel consultant’s claim? (Adapted 
from I-Map Data Systems LLC) 


18. An agricultural analyst is comparing the wheat production in Oklahoma 
counties. The analyst claims that the variation in wheat production is greater 
in Garfield County than in Kay County. A sample of 21 Garfield County 
farms has a standard deviation of 0.76 bushel per acre. A sample of 16 Kay 
County farms has a standard deviation of 0.58 bushel per acre. At a = 0.10, 
can you support the analyst’s claim? (Adapted from Environmental Verification 
and Analysis Center— University of Oklahoma) 


lad} 19. An instructor claims that the variance of 


- ; Femal Mal 
SAT critical reading scores for females ce non 
is different than the variance of SAT 480 600 560 310 
critical reading scores for males. The 610 800 | 680 730 
table shows the SAT critical reading 340 540 | 360 740 
scores for 12 randomly selected female 630 750 530 520 


students and 12 randomly selected male 
students. At a = 0.01, can you support 
the instructor’s claim? 


520 650 380 560 
690 630 460 400 


Review Exercises 597 


Be 20. A quality technician claims that the variance of the insert diameters 
produced by a plastic company’s new injection mold for automobile 
dashboard inserts is less than the variance of the insert diameters 
produced by the company’s current mold. The table shows samples of 
insert diameters (in centimeters) for both the current and new molds. 
At a = 0.05, can you support the technician’s claim? 


New 9.611 9.618 9.594 9.580 | 9.611 | 9.597 
Current 9.571 | 9.642 | 9.650 | 9.651 | 9.596 | 9.636 


New 9.638 9.568 | 9.605 9.603 | 9.647 9.590 
Current 9.570 9.537 | 9.641 | 9.625 | 9.626 | 9.579 


Section 10.4 


In Exercises 21 and 22, (a) identify the claim and state Hy and H,, (b) find 
the critical value and identify the rejection region, (c) find the test statistic F, 
(d) decide whether to reject or fail to reject the null hypothesis, and (e) interpret 
the decision in the context of the original claim. Assume the samples are random 
and independent, the populations are normally distributed, and the population 
variances are equal. 


eB 21. The table shows the amounts spent (in dollars) on energy in one year 
for a sample of households from four regions of the United States. 
At a = 0.10, can you conclude that the mean amount spent on energy 
in one year is different in at least one of the regions? (Adapted from 
U.S. Energy Information Administration) 


Northeast | Midwest South West 


1896 1712 1689 1455 
2606 2096 2256 1164 
1649 1923 1834 = 1851 
2436 2281 2365 1776 
2811 2703 1958 | 2030 
2384 2092 1947 1640 
2840 1499 2433 1678 
2445 2146 1578 1547 


22. The table shows the annual 
incomes (in dollars) for a 


Q 


Northeast Midwest South West 


sample of families from 78,123 54,930 | 52,623 | 70,496 
four regions of the United 69,388 78,543 76,365 | 62,904 
States. At a = 0.05, can 78,251 76,602 50,668 59,113 
you conclude that the mean 54.379 57357 50.373 | 57.191 


oan ee me Oe tales 75,210 | 54,907 | 48,536 60,668 
different in at least one of 


the regions? (Adapted from 70,119 63,073 | 60,415 
U.S. Census Bureau) 46,833 


598 CHAPTER 10. Chi-Square Tests and the F-Distribution 


10 Chapter Quiz 


Take this quiz as you would take a quiz in class. After you are done, check your 
work against the answers given in the back of the book. 


In each exercise, 

(a) identify the claim and state Hy and H,, 

(b) find the critical value and identify the rejection region, 

(c) find the test statistic, 

(d) decide whether to reject or fail to reject the null hypothesis, and 


(e) interpret the decision in the context of the original claim. 


In Exercises I and 2, use the table, which lists the distribution of educational 
achievement for people in the United States ages 25 and older. It also lists the 
results of a random survey for two additional age groups. (Adapted from U.S. 
Census Bureau) 


Ages 
Educational attainment 25 and older 30-34 65-69 
None-8th grade 4.7% 10 23 
9th-11th grade 6.9% 23 26 
High school graduate 29.5% 80 136 
Some college, no degree 16.6% 56 77 
Associate’s degree 9.8% 34 41 
Bachelor’s degree 20.5% 75 78 
Master’s degree 8.7% 31 41 
Professional/doctoral degree 3.3% 11 18 


1. Does the distribution for people in the United States ages 25 and older 
differ from the distribution for people in the United States ages 30-34? 
Use a = 0.05. 


2. Use the data for 30- to 34-year-olds and 65- to 69-year-olds to test whether age 
and educational attainment are related. Use a = 0.01. 


eB In Exercises 3 and 4, use the data, which list the annual wages (in thousands 
of dollars) for randomly selected individuals from three metropolitan 
areas. Assume the wages are normally distributed and that the samples are 
independent. (Adapted from U.S. Bureau of Economic Analysis) 


Ithaca, NY: 44.2, 51.5, 25.8, 28.3, 37.8, 38.0, 32.6, 41.8, 42.0, 40.6, 26.2, 
27.9, 48.3 


Little Rock, AR: 45.1, 38.1, 47.8, 34.4, 39.6, 47.1, 29.6, 54.8, 34.4, 40.3, 
40.1, 41.7, 40.9, 38.9, 25.9 


Madison, WI: 50.3, 41.8, 55.5, 40.8, 55.6, 38.6, 50.0, 46.8, 49.0, 52.9, 48.3, 
47.5, 39.2, 32.7, 54.1 


3. At a = 0.01, is there enough evidence to conclude that the variances of the 
annual wages for Ithaca, NY, and Little Rock, AR, are different? 


4. Are the mean annual wages the same for all three cities? Use a = 0.10. 
Assume that the population variances are equal. 


Chapter Test 599 


10 Chapter Test 


Take this test as you would take a test in class. 


In each exercise, 

(a) identify the claim and state Hy and H,, 

(b) find the critical value and identify the rejection region, 

(c) find the test statistic, 

(d) decide whether to reject or fail to reject the null hypothesis, and 


(e) interpret the decision in the context of the original claim. 


7 In Exercises 1-3, use the data, which list the hourly wages (in dollars) for 
randomly selected respiratory therapy technicians from three states. Assume 
the wages are normally distributed and that the samples are independent. 
(Adapted from U.S. Bureau of Labor Statistics) 


Maine: 20.92, 25.37, 23.06, 15.64, 27.72, 24.90, 19.26, 23.46, 18.49, 21.76, 22.36 

Oklahoma: 22.70, 19.95, 17.85, 16.76, 21.32, 18.96, 17.99, 28.35, 25.30, 21.93 

Massachusetts: 25.43, 23.21, 30.81, 26.62, 31.42, 31.34, 34.58, 28.22, 27.25, 
22.83, 27.45, 27.71 


1. At a = 0.05, is there enough evidence to conclude that the variances of the 
hourly wages for respiratory therapy technicians in Maine and Massachusetts 
are the same? 


2. At a = 0.01, is there enough evidence to conclude that the variance of the 
hourly wages for respiratory therapy technicians in Oklahoma is greater 
than the variance of the hourly wages for respiratory therapy technicians in 
Massachusetts? 


3. Are the mean hourly wages of respiratory therapist technicians the same for all 
three states? Use a = 0.01. Assume that the population variances are equal. 


In Exercises 4-6, use the table, which lists the distribution of the ages of workers 
who carpool in Maine. It also lists the results of arandom survey for two additional 
States. (Adapted from U.S. Census Bureau) 


State 


Ages Maine Oklahoma Massachusetts 


16-19 6.1% 11 15 
20-24 | 11.8% 27 22 
25-44 | 41.8% 96 86 
45-54 | 21.2% 37 42 
55-59 | 10.1% 14 17 

60+ 9.0% 15 18 


4. Does the distribution of the ages of workers who carpool in Maine differ from 
the distribution of the ages of workers who carpool in Oklahoma? Use a = 0.10. 


5. Is the distribution of the ages of workers who carpool in Maine the same as the 
distribution of the ages of workers who carpool in Massachusetts? Use a = 0.01. 


6. Use the data for Oklahoma and Massachusetts to test whether state and age 
are independent. Use a = 0.05. 


Putting it all together 


REAL DECISIONS 


Fraud.org was created by the National Consumers League (NCL) 
to combat the growing problem of telemarketing and Internet 
fraud by improving prevention and enforcement. NCL works to 
protect and promote social and economic justice for consumers 
and workers in the United States and abroad. 

You work for the NCL as a statistical analyst. You are 
studying data on fraud. Part of your analysis involves testing the 
goodness-of-fit, testing for independence, comparing variances, 
and performing ANOVA. 


EXERCISES 


1. Goodness-of-Fit Expected — Survey 


The table at the right shows an expected distribution of Age distribution __ results 
the ages of fraud victims. The table also shows the results | Under 18 0.66% 8 
of a survey of 1000 randomly selected fraud victims. Using 
: : 18-25 14.92% 148 
a = 0.01, perform a chi-square goodness-of-fit test. What can 
you conclude? 26-35 22.09% 206 
2. Independence 36-45 17.29% 171 
The contingency table below shows the results of a random oS ah oe 
sample of 2000 fraud victims classified by age and type of 56-65 14.71% 153 
fraud. The frauds were committed using bogus sweepstakes or Over 65 13.28% 139 


credit card offers. 


(a) Calculate the expected frequency for each cell in the TELE ROR EAE OCISE | 


contingency table. Assume the variables age and type of 
fraud are independent. 


(b) Can you conclude that the ages of the victims are related 
to the type of fraud? Use a = 0.01. 


Age 
Type of 
Fraud Under 20 20-29 30-39 40-49 50-59 60-69 70-79 80+ Total 
Sweepstakes 10 60 70 130 90 160 280 200 1000 © 
Credit cards 20 180 | 260 I 240 180 | 70 30 20 | 1000 
| Total 30 | 240-330 | «370 | «270: | «230: = 310-220 | 2000 | 


600 CHAPTER 10. Chi-Square Tests and the F-Distribution 


TECHNOLOGY Jogbmmmss) | excrt 


Teacher Salaries oo 


Under At least 


The Illinois State Board of Education conducts an 500 1000-2999 = 12,000 
annual study of the salaries of Illinois teachers. The study students students _ students 
looks at how teachers’ salaries are distributed based on 36,462 41,862 40,726 


factors such as degree and experience level, district size, 
and geographic region. 
The table shows the beginning salaries of a random 


40,877 40,482 39,640 
33,937 38,292 40,686 


sample of Illinois teachers from different-sized districts. 32,957 38,264 37,347 
District size is measured by the number of students 38,313 43,385 | 46,239 
enrolled. 30,313 36,195 | 44,064 


33,490 44,117 | 41,855 
41,357 31,188 39,476 
29,892 38,746 44,136 
35,237 32,760 44,966 
29,760 41,527 46,992 
36,580 40,814 39,257 
33,547 | 29,997 | 39,572 


In Exercises 1-3, refer to the samples listed below. Use 5. Repeat Exercises 1—4 using the data in the table below. 
a = 0.05. The table displays the beginning salaries of a random 

sample of Illinois teachers from different geographic 
(a) Under 500 students 


regions of Illinois. 
(b) 1000-2999 students 


(c) At least 12,000 students fear lane | 
. . Northeast Northwest Southwest 
1. Are the samples independent of each other? Explain. | | | 
: . 43,652 32,569 37,176 
2. Use technology to determine whether each sample is age ie SoieA 
from a normal, or approximately normal, population. : : : 
: 37,836 29,265 39,337 
3. Use technology to determine whether the samples 536 650 
were selected from populations having equal variances. nee sba02 305 
F F F 42,228 32,495 37,233 
4. Using the results of Exercises 1-3, discuss whether - ae one 
the three conditions for a one-way ANOVA test are andl 7208 an 
satisfied. If so, use technology to test the claim that 39,221 42,818 30,694 
teachers from districts of the three sizes have the same 33,503 31,126 38,303 
mean salary. Use a = 0.05. 44,346 31,525 43,313 
43,038 32,867 46,053 
40,813 33,380 31,086 
36,581 33,341 29,364 
47,039 38,142 33,607 


Extended solutions are given in the technology manuals that accompany this text. 
Technical instruction is provided for Minitab, Excel, and the TI-84 Plus. 


Technology 601 


CHAPTERS 9&10 


7% 1. The table below shows the winning times (in seconds) for the men’s 
and women’s 100-meter runs in the Summer Olympics from 1928 to 
2016. (Source: The International Association of Athletics Federations) 


Men, x 10.80 | 10.38 10.30 10.30 10.79 10.62 10.32 


Women, y §=12.20 11.90 11.50 | 12.20 | 11.67 11.82 11.18 


Men,x 10.06 9.95 10.14 10.06 10.25 9.99 9.92 
Women, y 11.49 11.08 11.07 11.08 11.06 10.97 10.54 


Men, x 9.96 9.84 987 985 9.69 9.63 | 


981 


Women, y 10.82 10.94 10.75 10.93 10.78 10.75 10.90 


(a) Display the data in a scatter plot, calculate the correlation coefficient r, 
and describe the type of correlation. 

(b) At a = 0.05, is there enough evidence to conclude that there is 
a significant linear correlation between the winning times for the 
men’s and women’s 100-meter runs? 

(c) Find the equation of the regression line for the data. Draw the 
regression line on the scatter plot. 

(d) Use the regression equation to predict the women’s 100-meter time 
when the men’s 100-meter time is 9.90 seconds. 


. The table at the right shows 
the residential natural gas 


Q 


Northeast Midwest South West 


expenditures (in dollars) in 1608 449 509 591 
one year for a random sample 7719 1036 394 504 
of households in four regions 964 665 769 1011 
of the United States. Assume 1303 1213 753 463 
that the populations are 

normally distributed and the sae a ia at 
population variances are 1695 1393 574 324 
equal. At a = 0.10, can you 785 926 526 515 
reject the claim that the mean 778 866 1096 599 


expenditures are the same 
for all four regions? (Adapted 
from U.S. Energy Information 
Administration) 


3. The equation used to predict the annual sweet potato yield (in pounds per 
acre) is ? = 16,212 — 0.227x, + 0.212x., where x, is the number of acres 
planted and x, is the number of acres harvested. Use the multiple regression 
equation to predict the annual sweet potato yields for the values of the 
independent variables. (Adapted from U.S. Department of Agriculture) 


(a) x; = 110,000, x. = 100,000 (b) x, = 125,000, x. = 115,000 


602 CHAPTER 10 Chi-Square Tests and the F-Distribution 


4. A school administrator claims that the standard deviations of reading 
test scores for eighth-grade students are the same in Colorado and Utah. 
A random sample of 16 test scores from Colorado has a standard deviation 
of 34.6 points, and a random sample of 15 test scores from Utah has a 
standard deviation of 33.2 points. At a= 0.10, can you reject the 
administrator’s claim? Assume the samples are independent and each 
population has a normal distribution. (Adapted from National Center for 


Education Statistics) 


5. A researcher claims that the credit card debts of college students are 
distributed as shown in the pie chart. You randomly select 900 college 
students and record the credit card debt of each. The table shows the results. 
At a = 0.05, test the researcher’s claim. (Adapted from Sallie Mae, Inc.) 


$1001-$2000 


5.1% 


$501-$1000 
114% 


$1-$500 
42.2% 


$2001-$4000 
5.3% 


More than 
$4000 
3.2% 


Survey results 


Response 
$0 
$1-$500 
$501-$1000 
$1001—$2000 
$2001—$4000 
More than $4000 


Frequency, f 


290 
397 
97 
54 
40 
22 


6. Reviewing a Movie The contingency table shows how a random sample 
of adults rated a newly released movie and gender. At a = 0.05, can you 
conclude that the adults’ ratings are related to gender? 


Gender Excellent 
Male 97 
Female 101 


Rating 
Good Fair Poor 
42 26 5 
33 25 11 


7. The table shows the metacarpal bone lengths (in centimeters) and the 
heights (in centimeters) of 12 adults. The equation of the regression 
line is 9 = 1.707x + 94.380. (Adapted from the American Journal of Physical 


Anthropology) 
Metacarpal 4s 
bone length, x 
Height, y 171 
Metacarpal 46 
bone length, x 
Height, y 173 


51 39 41 
178 | 157. 163 
43 47 42 
175 173-169 


48 


172 


40 


160 


49 


183 


44 


172 


(a) Find the coefficient of determination r’ and interpret the results. 
(b) Find the standard error of estimate s, and interpret the results. 


(c) Construct a 95% prediction interval for the height of an adult whose 
metacarpal bone length is 50 centimeters. Interpret the results. 


Cumulative Review 


603 


II 
Nonparametric lests 


1.1 


1.2 


Case Study 


11.3 
11.4 


11.5 


Uses and Abuses 
Real Statistics—Real Decisions 
Technology 


In a recent year, the most common form of reported identity theft was employment- or 
tax-related fraud, which accounted for 34% of cases. The second most common form 
was Credit card fraud, which accounted for 33% of cases. 


604 


a Where You ve Been 


Up to this point in the text, you have studied dozens of 
different statistical formulas and tests that can help you in 
a decision-making process. Specific conditions had to be 
satisfied in order to use these formulas and tests. 


| Fraud complaints 39,344 | 45,528 | 33,745 | 21,117 | 7593 | 117,189 | 5768 | 7800 | 14,635 


Suppose it is believed that as the number of fraud 
complaints in a state increases, the number of identity 
theft victims also increases. Can this belief be supported 
by actual data? The table below shows the numbers of 
fraud complaints and the numbers of identify theft victims 
for 25 randomly selected states in a recent year. (Source: 
Federal Trade Commission) 


Identity theft victims | 4007 | 8748 | 6203 | 4933 1484 12,787 | 789 


Fraud complaints 


5642 | 48,594 | 107,557 | 4600 | 25,636 | 7525 112,006 


| 1348 | 2532 


77,213 


Identity theft victims 1170 8251 | 17,430 | 711 | 3993 | 1352 20,205 | 11,009 


Fraud complaints | 20,350 | 22,385 | 7206 2775 51,036 12,750 | 40,423 


9948 | 


Identity theft victims — 3337 | 4312 1216 503 5718 | 2540 | 8310 +1093 


by, Where You re Going 


In this chapter, you will study additional statistical tests 
that do not require the population distribution to meet any 
specific conditions. Each of these tests has usefulness in 
real-life applications. 

With the data above, the number of fraud complaints F and 
the number of identity theft victims V can be related by the 
regression equation V = 0.145F + 429.103. The correlation 
coefficient is approximately 0.915, so there is a strong 
positive correlation. You can determine that the correlation 
is significant by using Table 11 in Appendix B. Further 
analysis of the data, however, can show that the variables 
do not appear to have a bivariate normal distribution, which 
is one of the requirements for using the Pearson correlation 
coefficient. 


So, although a simple correlation test might indicate a 
relationship between the number of fraud complaints and 
the number of identity theft victims, one might question 
the results because the data do not fit the requirements for 
the test. Similar tests you will study in this chapter, such as 


Spearman’s rank correlation test, will give you additional 
information. The Spearman’s rank correlation coefficient 
for this data is approximately 0.962. At a = 0.01, there is in 
fact a significant correlation between the number of fraud 
complaints and the number of identity theft victims for 
each state. 


Number of Fraud Complaints 
and Identity Theft Victims 


for 25 States 
y 
A 
25,000 +— 
n 
& 
5 20,000 + e 
2 : 
Ma 
S 15,000 -+ 
Ss e 
2 10,000-+ 7 
= e% 
(3) e e 
3S 5000+ +c 
eo 
ey 


20,000 40,000 60,000 80,000 100,000 120,000 
Fraud complaints 


605 


606 CHAPTER 11. Nonparametric Tests 


The Sign lest 


What You Should Learn The Sign Test for a Population Median m= The Paired-Sample Sign Test 
» How to use the sign test to test . : F 
a population median The Sign Test for a Population Median 
~ How to use the paired-sample Many of the hypothesis tests studied so far have imposed one or more 


sign test to test the difference 
between two population 
medians (dependent samples) 


requirements for a population distribution. For instance, some tests require 
that a population must have a normal distribution, and other tests require that 
population variances be equal. What should you do when such requirements 
cannot be met? For these cases, statisticians have developed hypothesis tests that 
are “distribution free.” Such tests are called nonparametric tests. 


DEFINITION 


A nonparametric test is a hypothesis test that does not require any specific 


conditions concerning the shapes of population distributions or the values of 
population parameters. 


Nonparametric tests are usually easier to perform than corresponding 
parametric tests. They are, however, usually less efficient than parametric 
tests. Stronger evidence is required to reject a null hypothesis using the 
results of a nonparametric test. Consequently, whenever possible, you should 
use a parametric test. One of the easiest nonparametric tests to perform is the 
sign test. The only condition necessary to use a sign test is that the sample is 
randomly selected. 


DEFINITION 
Study Tip 

For many nonparametric 
tests, statisticians test 
the median instead of 
the mean. 


The sign test is a nonparametric test that can be used to test a population 


median against a hypothesized value k. 


The sign test for a population median can be left-tailed, right-tailed, or 
two-tailed. The null and alternative hypotheses for each type of test are 
shown below. 


Left-tailed test: 
Ho: median = k and H,: median < k 


Right-tailed test: 
Ao: median = k and H,: median > k 


Two-tailed test: 
Ho: median = k and H,: median # k 


To use the sign test, first compare each entry in the sample with the 
hypothesized median k. When the entry is below the median, assign it a — sign; 
when the entry is above the median, assign it a + sign; and when the entry is 
equal to the median, assign it a 0. Then compare the number of + and — signs. 
(The 0’s are ignored.) When there is a large difference between the number of 
+ signs and the number of — signs, it is likely that the median is different from 
the hypothesized value and you should reject the null hypothesis. 


» Study Tip 


Because the 0's are 
ignored, there are two 
possible outcomes when 
comparing a data entry 
with a hypothesized 
median: a + or a — sign. 
If the median is k, then about half of 
the values will be above k and half 
will be below. As such, the probability 
for each sign is 0.5. Table 8 in 
Appendix B is constructed using the 
binomial distribution where p = 0.5. 


When n > 25, you can use the 
normal approximation (with a 

continuity correction) for the binomial. 
In this case, use w = np = 0.5n and 


SECTION 11.1 The Sign Test 607 


Table 8 in Appendix B lists the critical values for the sign test for selected 
levels of significance and sample sizes. When the sign test is used, the sample 
size n is the total number of + and — signs. When the sample size is greater 
than 25, you can use the standard normal distribution to find the critical values. 


Test Statistic for the Sign Test 


When n = 25, the test statistic for the sign test is x, the smaller number of 
+ or — signs. 


When n > 25, the test statistic for the sign test is 
_ (+05) = O.5n 


where x is the smaller number of + or — signs and n is the sample size, Le., 
the total number of + and — signs. 


Because x is defined to be the smaller number of + or — signs, the rejection 
region is always in the left tail. Consequently, the sign test for a population 
median is always a left-tailed test or a two-tailed test. When the test is two-tailed, 
use only the left-tailed critical value. (When x is defined to be the larger number 
of + or — signs, the rejection region is always in the right tail. Right-tailed sign 
tests are presented in the exercises.) 


GUIDELINES 


Performing a Sign Test for a Population Median 
In Words In Symbols 
. Verify that the sample is random. 


. Identify the claim. State the null 
and alternative hypotheses. 


State Hp and H,. 


. Specify the level of significance. Identify a. 


n = total number of 
+ and — signs 


. Determine the sample size n by 
assigning + signs, — signs, and 0’s 
to the sample data. 

When n S 25, use Table 8 in 

Appendix B. 

When n > 25, use Table 4 in 

Appendix B. 

When n S 25, use 

x = smaller number of 

+ or — signs. 

When n > 25, use 

_ te #05) = O1an 


Vn 


2 


If the test statistic is less than 
or equal to the critical value, 
then reject Hp. Otherwise, 
fail to reject Hp. 


.- Determine the critical value. 


. Find the test statistic. 


. Make a decision to reject or fail 
to reject the null hypothesis. 


. Interpret the decision in the 
context of the original claim. 


608 CHAPTER 11. Nonparametric Tests 


Using the Sign Test 


A website administrator for a company claims that the median number of 
visitors per day to the company’s website is no more than 1500. An employee 
doubts the accuracy of this claim. The numbers of visitors per day for 
20 randomly selected days are listed below. At a = 0.05, can the employee 
reject the administrator’s claim? 


1469 1462 1634 1602 1500 
1463 1476 1570 1544 1452 
1487 1523) 1525 1548 1511 
1579 1620 1568 1492 1649 


SOLUTION 
The claim is “the median number of visitors per day to the company’s website 
is no more than 1500.” So, the null and alternative hypotheses are 

Ho: median = 1500 (Claim) and H,: median > 1500. 


To compare each data entry with the hypothesized median 1500, subtract 1500 
from each data entry and assign the appropriate sign or 0. For instance, here 
are the comparisons for the first row of data entries. 

1469 — 1500 = —31, assign a — sign 

1462 — 1500 = —38, assign a — sign 

1634 — 1500 = +134, assign a + sign 

1602 — 1500 = +102, assign a + sign 

1500 — 1500 = 0, assign a 0 


The results of comparing each data entry with the hypothesized median 1500 


are shown. 
- -~- + + 0 
— + + + + 
t+ + + = + 


You can see that there are 7 — signs and 12 + signs. So,n = 12 + 7 = 19. 
Because n = 25, use Table 8 in Appendix B to find the critical value. The 
test is a one-tailed test with a = 0.05 and n = 19. So, the critical value is 5. 
Because n = 25, the test statistic x is the smaller number of + or — signs. So, 
x = 7. Because x = 7 is greater than the critical value, the employee should 
fail to reject the null hypothesis. 


Interpretation There is not enough evidence at the 5% level of significance 
for the employee to reject the website administrator’s claim that the median 
number of visitors per day to the company’s website is no more than 1500. 


TRY IT YOURSELF 1 


A real estate agency claims that the median number of days a home is on 
the market in its city is greater than 120. A homeowner wants to verify the 
accuracy of this claim. The numbers of days on the market for 24 randomly 
selected homes are shown below. At a = 0.025, can the homeowner support 
the agency’s claim? 


118 167 72 79 76 106 102 113 

73 119 162 114 120 93 135 147 

77 157 115 «88 152 70 65 91 
Answer: Page TI 


LN 
Coy. 
oes: 


eee) Picturing 
the World 


For recent college graduates in the 
United States, a financial analyst 
claims that the median auto loan 

is $21,883. A random sample of 
recent college graduates reveals 
that the loans for 42 graduates were 
less than $21,883 and the loans 

for 35 graduates were greater than 
$21,883. (Adapted from lendedu.com) 


Would you use a parametric test 
or a nonparametric test to test 
the claim that for recent college 
graduates in the United States, 
the median auto loan is $21,883? 
Explain your reasoning. 


Study Tip 
When performing a 
two-tailed sign test, 
remember to use only the left-tailed 
critical value. 


SECTION 11.1 The Sign Test 609 


Using the Sign Test 


An organization claims that the median annual attendance for museums in 
the United States is at least 39,000. A random sample of 125 museums reveals 
that the annual attendances for 79 museums were less than 39,000, the annual 
attendances for 42 museums were more than 39,000, and the annual attendances 
for 4 museums were 39,000. At a = 0.01, is there enough evidence to reject the 
organization’s claim? (Adapted from American Association of Museums) 


SOLUTION 


The claim is “the median annual attendance for museums in the United States 
is at least 39,000.” So, the null and alternative hypotheses are 


Ho: median = 39,000 (Claim) and H,: median < 39,000. 


Because n > 25, use Table 4 in Appendix B, the Standard Normal Table, to 
find the critical value. Because the test is a left-tailed test with a = 0.01, the 
critical value is z) = —2.33. Of the 125 museums, there are 79 — signs and 
42 + signs. When the 0’s are ignored, the sample size is 


n= 79 + 42 = 121, and x = 42. 
With these values, the test statistic is 
(42 + 0.5) — 0.5(121) 


V121/2 


—18 
5.5 
ew —3.27. 
The figure shows the location of the rejection region and the test statistic z. 


Because z is less than the critical value, it is in the rejection region. So, you 
reject the null hypothesis. 


-4 -3 /-2 -l1 0 1 2 3 4 
Zp = -2.33 


Interpretation There is enough evidence at the 1% level of significance to 
reject the organization’s claim that the median annual attendance for museums 
in the United States is at least 39,000. 


TRY IT YOURSELF 2 


An organization claims that the median age of museum workers in the United 
States is 46 years old. A random sample of 95 museum workers reveals that 
57 museum workers were less than 46 years old, 34 museum workers were 
more than 46 years old, and 4 museum workers were 46 years old. At a = 0.10, 
can you reject the organization’s claim? (Adapted from American Association of 
Museums) 

Answer: Page T1 


610 CHAPTER 11. Nonparametric Tests 


The Paired-Sample Sign Test 


In Section 8.3, you learned how to use a t-test for the difference between means 
of dependent samples. That test required both populations to be normally 
distributed. When the parametric condition of normality cannot be satisfied, you 
can use the paired-sample sign test to test the difference between two population 
medians. To perform the paired-sample sign test for the difference between 
two population medians, these conditions must be met. 


1. A sample must be randomly selected from each population. 


2. The samples must be dependent (paired). 


The paired-sample sign test can be left-tailed, right-tailed, or two-tailed. 
This test is similar to the sign test for a single population median. However, 
instead of comparing each data entry with a hypothesized median and recording 
a +, —, or 0, you find the difference between corresponding data entries and 
record the sign of the difference. Generally, to find the difference, subtract the 
entry representing the second variable from the entry representing the first 
variable. Then compare the number of + and — signs. (The 0’s are ignored.) 
When the number of + signs is approximately equal to the number of — signs, 
you should fail to reject the null hypothesis. When there is a large difference 
between the number of + signs and the number of — signs, you should reject the 
null hypothesis. 


GUIDELINES 


Performing a Paired-Sample Sign Test 
In Words 


. Verify that the samples are random 
and dependent. 


In Symbols 


. Identify the claim. State the null 


and alternative hypotheses. 


. Specify the level of significance. 


. Determine the sample size n by 


finding the difference for each 
data pair. Assign a + sign fora 
positive difference, a — sign for 
a negative difference, and a 0 
for no difference. 


.- Determine the critical value. 
. Find the test statistic. 


. Make a decision to reject or fail 


to reject the null hypothesis. 


. Interpret the decision in the 
context of the original claim. 


State Hp and H,. 


Identify a. 


n = total number of 
+ and — signs 


Use Table 8 in Appendix B. 


x = smaller number of 

+ or — signs 
If the test statistic is less than 
or equal to the critical value, 
then reject Hp. Otherwise, fail 
to reject Hp. 


SECTION 11.1. The Sign Test 611 


Using the Paired-Sample Sign Test 


A psychologist claims that the number of repeat offenders will decrease when 
first-time offenders complete a particular rehabilitation course. You randomly 
select 10 prisons and record the number of repeat offenders during a 
two-year period. Then, after first-time offenders complete the course, you 
record the number of repeat offenders at each prison for another two-year 
period. The results are shown in the table below. At a = 0.025, can you 
support the psychologist’s claim? 


Prison 1 2, 3 4 5 6 7 8 9 10 


Before 21 | 34) 9 45 30 54 37 > 36 33 | 40 
After 19 | 22 | 16 | 31 | 21 | 30 | 22 | 18 | 17 | 21 


SOLUTION 
To support the psychologist’s claim, use the null and alternative hypotheses 
below. 

HA: The number of repeat offenders will not decrease. 


H,: The number of repeat offenders will decrease. (Claim) 


The table below shows the sign of the differences between the “before” and 
“after” data. 


Prison 1 2 3 4 5 6 7 8 9 | 10 
Before 21 34 9 45 30 54 37 > 36 33 | 40 
After 19 | 22 | 16 | 31 | 21 | 30 | 22 | 18 | 17 | 21 
Sign +t o—- t+ + +) t+) + ++ 


You can see that there is 1 — sign and there are 9 + signs. So,n = 1 + 9 = 10. 
Because the test is a one-tailed test with a = 0.025 and n = 10, the critical 
value is 1. The test statistic x is the smaller number of + or — signs. So, x = 1. 
Because x is equal to the critical value, you reject the null hypothesis. 


Interpretation There is enough evidence at the 2.5% level of significance 
to support the psychologist’s claim that the number of repeat offenders will 
decrease. 


TRY IT YOURSELF 3 


A medical researcher claims that a new vaccine will decrease the number of 
colds in adults. You randomly select 14 adults and record the number of colds 
each has in a one-year period. After giving the vaccine to each adult, you again 
record the number of colds each has in a one-year period. The results are 
shown in the table below. At a = 0.05, can you support the researcher’s claim? 


Adult 1° 2°53 4° 5°6°7'8)9) 10) 11) 12) 13 14 


Before}, )4io2l/ii3sie6l4i/si2}/ol2is5|/313 
vaccine 
iter C/O) 4) 9S) | | we el ae S| 
vaccine 


Answer: Page T1 


612 CHAPTER 11. Nonparametric Tests 


11.1 EXERCISES 


For Extra Help: MyLab Statistics 


Building Basic Skills and Vocabulary 


1. 


What is a nonparametric test? How does a nonparametric test differ from 
a parametric test? What are the advantages and disadvantages of using a 
nonparametric test? 


. When the sign test is used, what population parameter is being tested? 


. Describe the test statistic for the sign test when the sample size n is less than 


or equal to 25 and when n is greater than 25. 


. In your own words, explain why the hypothesis test discussed in this section 


is called the sign test. 


. Explain how to use the sign test to test a population median. 


. List the two conditions that must be met in order to use the paired-sample 


sign test. 


Using and Interpreting Concepts 


Performing a Sign Test Jn Exercises 7-22, (a) identify the claim and state 
Hy and H,, (b) find the critical value, (c) find the test statistic, (d) decide whether 
to reject or fail to reject the null hypothesis, and (e) interpret the decision in the 
context of the original claim. 


7. 


Credit Card Charges A financial service accountant claims that the median 
credit card balance of college students is more than $300. You randomly select 
the credit card accounts of 12 college students and record the balance for 
each account. The balances (in dollars) are listed below. At a = 0.01, can you 
support the accountant’s claim? (Adapted from Sallie Mae) 


346.71 382.59 255.03 202.17 309.80 265.88 
299.41 270.38 296.54 318.46 245.92 309.47 


. Temperature A meteorologist claims that the median daily high 


temperature for the month of July in Pittsburgh is 83° Fahrenheit. The 
high temperatures (in degrees Fahrenheit) for 15 randomly selected July 
days in Pittsburgh are listed below. At a = 0.01, is there enough evidence 
to reject the meteorologist’s claim? (Adapted from U.S. National Oceanic and 
Atmospheric Administration) 


74 79 81 86 90 79 81 83 81 74 78 76 84 82 85 


. Sales Prices of Homes A real estate agent claims that the median sales 


price of new privately owned one-family homes sold in a recent month is 
$253,000 or less. The sales prices (in dollars) of 10 randomly selected homes 
are listed below. At a = 0.05, is there enough evidence to reject the agent’s 
claim? (Adapted from National Association of Realtors) 


262,600 300,100 269,200 249,400 183,400 
253,500 325,600 223,500 241,300 271,300 


10. 


11. 


13. 


14, 


15. 


16. 


17. 


SECTION 11.1 The Sign Test 613 


Temperature During a weather report, a meteorologist claims that the 
median daily high temperature for the month of January in San Diego 
is 66° Fahrenheit. The high temperatures (in degrees Fahrenheit) for 
16 randomly selected January days in San Diego are listed below. At 
a = 0.01, can you reject the meteorologist’s claim? (Adapied from U.S. 
National Oceanic and Atmospheric Administration) 


78 74 72 72 70 70 72 78 74 71 72 74 77 79 75 73 


Credit Card Debt A financial services institution claims that the median 
amount of credit card debt for families holding such debts is at least $2300. 
In a random sample of 104 families with credit card debt, the debts of 
60 families were less than $2300 and the debts of 44 families were greater 
than $2300. At a = 0.02, can you reject the institution’s claim? (Adapted 
from Board of Governors of the Federal Reserve System) 


. Financial Debt A financial services accountant claims that the median 


amount of financial debt for families holding such debts is less than $60,000. 
In a random sample of 70 families with financial debt, the debts of 24 families 
were less than $60,000 and the debts of 46 families were greater than $60,000. 
At a = 0.025, can you support the accountant’s claim? (Adapted from Board 
of Governors of the Federal Reserve System) 


Social Media A research group claims that the median age of the users of 
a social media website is greater than 30 years old. In a random sample of 
24 users, 11 are less than 30 years old, 10 are more than 30 years old, and 3 
are 30 years old. At a = 0.01, can you support the research group’s claim? 
(Adapted from Pew Research Center) 


Social Networking A research group claims that the median age of the 
users of a social networking website is less than 32 years old. In a random 
sample of 20 users, 5 are less than 32 years old, 13 are more than 32 years 
old, and 2 are 32 years old. At a = 0.05, can you support the research 
group’s claim? (Adapted from Pew Research Center) 


Unit Size A renters’ organization claims that the median number of rooms 
in renter-occupied units is four. You randomly select 120 renter-occupied 
units and obtain the results shown below. At a = 0.05, can you reject the 
organization’s claim? (Adapted from U.S. Census Bureau) 


Number Number 
Unit size of units Square footage _ of units 
Fewer than 4 rooms 29 Less than 1000 13 
4 rooms 38 1000 2 
More than 4 rooms 53 More than 1000 7 
TABLE FOR EXERCISE 15 TABLE FOR EXERCISE 16 


Square Footage A renters’ organization claims that the median square 
footage of renter-occupied units is 1000 square feet. You randomly select 
22 renter-occupied units and obtain the results shown above. At a = 0.10, 
can you reject the organization’s claim? (Adapted from U.S. Census Bureau) 


Hourly Wages A labor organization claims that the median hourly wage 
of computer systems analysts is $41.93. In a random sample of 45 computer 
systems analysts, 18 earn less than $41.93 per hour, 25 earn more than 
$41.93 per hour, and 2 earn $41.93 per hour. At a = 0.01, can you reject the 
labor organization’s claim? (Adapted from U.S. Bureau of Labor Statistics) 


614 


CHAPTER 11 


Nonparametric Tests 


18. 


19. 


Hourly Wages A labor organization claims that the median hourly wage 
of podiatrists is at least $60.01. In a random sample of 23 podiatrists, 
17 earn less than $60.01 per hour, 5 earn more than $60.01 per hour, and 
1 earns $60.01 per hour. At a = 0.05, can you reject the labor organization’s 
claim? (Adapted from U.S. Bureau of Labor Statistics) 


Lower Back Pain A physician claims that lower back pain intensity scores 
will decrease after receiving acupuncture treatment. The table shows the 
lower back pain intensity scores for eight patients before and after receiving 
acupuncture for eight weeks. At a = 0.05, is there enough evidence to 
support the physician’s claim? (Adapted from Archives of Internal Medicine) 


Patient 1 2 3 4 5 6 | 8 
Intensity score (before) 59.2 | 46.3 | 65.4 74.0 79.3 81.6 44.4 | 59.1 
Intensity score (after) 12.4 | 22.5 | 18.6 | 59.3 | 70.1 | 70.2 | 13.2 | 25.9 


eB 20. Lower Back Pain A physician claims that lower back pain intensity 


scores will decrease after taking anti-inflammatory drugs. The table 
shows the lower back pain intensity scores for 12 patients before and 
after taking anti-inflammatory drugs for 8 weeks. At a = 0.05, is 
there enough evidence to support the physician’s claim? (Adapted from 
Archives of Internal Medicine) 

Patient 1 2 3 4 5 6 

Intensity score (before) 71.0 | 42.1 | 79.1 | 57.5 64.0 | 60.4 


Intensity score (after) 60.1 | 23.4 | 86.2 | 62.1 | 44.2 | 49.7 


Patient 7 8 9 10 11 12 
Intensity score (before) 68.3 | 95.2 | 48.1 786 65.4 | 59.9 
Intensity score (after) 58.3 72.6 | 51.8 | 82.5 63.2 | 47.9 


Be 21. Improving SAT Scores A tutoring agency claims that by completing a 


special course, students will improve their math SAT scores. In part of 
a study, 12 students take the math part of the SAT, complete the special 
course, then take the math part of the SAT again. The students’ scores 
are shown below. At a = 0.05, is there enough evidence to support the 
agency’s claim? 

Student 1 2 3 4 5 6 

Score on first SAT 300 450 | 350 | 430 300. 470 


Score onsecondSAT 300 520 400 410 300 | 480 


Student 7 8 9 10 11 12 
Score on first SAT 530 | 200 | 200 350 360 250 
Score onsecondSAT 700 250 390 350 | 480 | 300 


SECTION 11.1. The Sign Test 615 


Be 22. SAT Scores A guidance counselor claims that students who take 


23. 


24. 


the SAT twice will improve their scores the second time they take 
the SAT. The table shows both math SAT scores for 12 students 
who took the SAT twice. At a = 0.01, can you support the guidance 
counselor’s claim? 

Student 1 2 3 4 5 6 

Score on first SAT 440 510 420 450 | 620 | 450 


Score onsecondSAT 440 570 510 470 | 610 | 450 


Student 7 8 9 10 11 12 
Score on first SAT 350 | 470 | 320 510. 630 | 570 
Score onsecondSAT 370 530 290 500 | 640 | 600 


Feeling Your Age A research organization conducts a survey by randomly 
selecting adults and asking each, “How do you feel relative to your age?” 
The results are shown in the figure. (Adapted from Pew Research Center) 


My age 
9 


‘Younger 
iat 


(a) Use a sign test to test the null hypothesis that the proportion of adults 
who feel older is equal to the proportion of adults who feel younger. 
Assign a + sign to each adult who responded “older,” assign a — sign to 
each adult who responded “younger,” and assign a 0 to each adult who 
responded “my age.” Use a = 0.05. 


(b) What can you conclude? 


Contacting Parents A research organization conducts a survey by randomly 
selecting adults and asking each, “How frequently do you contact your 
parents by phone?” The results are shown in the figure. (Adapied from Pew 
Research Center) 


Daily 


(a) Use a sign test to test the null hypothesis that the proportion of adults 
who contact their parents by phone weekly is equal to the proportion 
of adults who contact their parents by phone daily. Assign a + sign to 
each adult who responded “weekly,” assign a — sign to each adult who 
responded “daily,” and assign a 0 to each adult who responded “other.” 
Use a = 0.05. 


(b) What can you conclude? 


616 


CHAPTER 11 


Nonparametric Tests 


Extending Concepts 


More on Sign Tests When you are using a sign test for n > 25 and the test is 
left-tailed, you know you can reject the null hypothesis when the test statistic 


_ (x + 05) — 0.5n 
Vn 


2 


is less than or equal to the left-tailed critical value, where x is the smaller number 
of + or — signs. For a right-tailed test, you can reject the null hypothesis when the 
test statistic 


(x — 0.5) — 0.5n 
7 Vn 


2 


is greater than or equal to the right-tailed critical value, where x is the larger 
number of + or — signs. 


In Exercises 25-28, use a right-tailed test and (a) identify the claim and state Ho 
and H,, (b) find the critical value, (c) find the test statistic, (d) decide whether 
to reject or fail to reject the null hypothesis, and (e) interpret the decision in the 
context of the original claim. 


25. Weekly Earnings A labor organization claims that the median weekly 
earnings of female workers is less than or equal to $765. To test this claim, 
you randomly select 50 female workers and ask each to provide her weekly 
earnings. The table shows the results. At a = 0.01, can you reject the 
organization’s claim? (Adapted from U.S. Bureau of Labor Statistics) 


Weekly Number Weekly Number 
earnings of workers earnings of workers 
Less than $765 18 Less than $950 23 
$765 3 $950 2 
More then $765 29 More than $950 45 
TABLE FOR EXERCISE 25 TABLE FOR EXERCISE 26 


26. Weekly Earnings A labor organization claims that the median weekly 
earnings of male workers is greater than $950. To test this claim, you 
randomly select 70 male workers and ask each to provide his weekly 
earnings. The table shows the results. At a = 0.01, can you support the 
organization’s claim? (Adapted from U.S. Bureau of Labor Statistics) 


27. Ages of Brides A marriage counselor claims that the median age of brides 
at the time of their first marriage is less than or equal to 27 years old. In 
a random sample of 65 brides, 24 are less than 27 years old, 35 are more 
than 27 years old, and 6 are 27 years old. At a = 0.05, can you reject the 
counselor’s claim? (Adapted from U.S. Census Bureau) 


28. Ages of Grooms A marriage counselor claims that the median age of 
grooms at the time of their first marriage is greater than 28 years old. In 
a random sample of 56 grooms, 33 are less than 28 years old and 23 are 
more than 28 years old. At a = 0.05, can you support the counselor’s 
claim? (Adapted from U.S. Census Bureau) 


What You Should Learn 


» How to use the Wilcoxon 
signed-rank test to determine 
whether two dependent 
samples are selected from 
populations having the same 
distribution 


~ How to use the Wilcoxon rank 
sum test to determine whether 
two independent samples are 
selected from populations 
having the same distribution 


Study Tip 


Recall that the absolute 
value of a number is its 


A pair of vertical bars, | |, 
is used to denote absolute 
value. For example, 

|3| = 3 and |-7| = 7. 


apa Ihe Wilcoxon Tests 


value, disregarding its sign. 


SECTION 11.2 The Wilcoxon Tests 617 


The Wilcoxon Signed-Rank Test m= The Wilcoxon Rank Sum Test 


The Wilcoxon Signed-Rank Test 


In this section, you will study the Wilcoxon signed-rank test and the Wilcoxon 
rank sum test. Unlike the sign test from Section 11.1, the strength of these two 
nonparametric tests is that each considers the magnitude, or size, of the data entries. 

In Section 8.3, you used a f-test together with dependent samples to 
determine whether there was a difference between two populations. To use the 
t-test to test such a difference, you must assume (or know) that the dependent 
samples are randomly selected from populations having a normal distribution. 
But, what should you do when the normality assumption cannot be made? 
Instead of using the two-sample f-test, you can use the Wilcoxon signed-rank test. 


DEFINITION 


The Wilcoxon signed-rank test is a nonparametric test that can be used to 
determine whether two dependent samples were selected from populations 


having the same distribution. 


GUIDELINES 


Performing a Wilcoxon Signed-Rank Test 
In Words In Symbols 


. Verify that the samples are random 
and dependent. 


. Identify the claim. State the null State Hp and H,. 
and alternative hypotheses. 


. Specify the level of significance. Identify a. 


. Determine the sample size n, 
which is the number of pairs 
of data for which the difference 
is not 0. 


. Determine the critical value. Use Table 9 in Appendix B. 
. Find the test statistic w,. Headers: Sample 1, 


a. Complete a table using the Sample 2, Difference, 
headers listed at the right. Absolute value, Rank, 
and Signed rank. Signed 

rank takes on the same 
sign as its corresponding 
difference. 


b. Find the sum of the positive ranks 
and the sum of the negative ranks. 


ce. Select the smaller absolute 
value of the sums. 


. Make a decision to reject or fail If w, is less than or equal 
to reject the null hypothesis. to the critical value, then 
reject Hy. Otherwise, fail to 
reject Ho. 


. Interpret the decision in the 
context of the original claim. 


618 CHAPTER 11. Nonparametric Tests 


Study Tip 


Do not assign a rank 

to any difference of 0. 

In the case of a tie 
between data entries, 

use the average of the 
corresponding ranks. For 
instance, when two data entries are 
tied for the fifth rank, use the average 
of 5 and 6, which is 5.5, as the rank 
for both entries. The next data entry 
will be assigned a rank of 7 not 6. 


When three entries are tied for the 
fifth rank, use the average of 5, 6, 
and 7 which is 6, as the rank for 
all three data entries. The next data 
entry will be assigned a rank of 8. 


Performing a Wilcoxon Signed-Rank Test 


A golf club manufacturer claims that golfers can lower their scores by using 
the manufacturer’s newly designed golf clubs. The table shows the scores of 
10 golfers while using the old design and while using the new design on the 
same golf course. At a = 0.05, can you support the manufacturer’s claim? 


Golfer 1 2 3 4 5 6 7 8 9 | 10 
Score (old design) 89 84 96 74) 91) 85 95 82) 92) 81 


Score (new design) 83 83 | 92 76 | 91 | 80 87 85 | 90 | 77 


SOLUTION 
The claim is “golfers can lower their scores.” To test this claim, use the null and 
alternative hypotheses below. 

Ho: The new design does not lower scores. 


H,: The new design lowers scores. (Claim) 


This Wilcoxon signed-rank test is a one-tailed test with a = 0.05, and because 
one data pair has a difference of 0, n = 9 instead of 10. From Table 9 in 
Appendix B, the critical value is 8. To find the test statistic w,, complete a table 
as shown below. 


Score Score Absolute Signed 
(old design) (new design) | Difference value Rank rank 
89 83 6 6 8 8 
84 83 1 1 1 1 
96 92 4 + 5.5 5.5 
74 76 2 2 2.5 = 25 
91 91 0 _ — 
85 80 5 i 7 
95 87 8 8 9 9 
82 85 —3 3 4 —4 
92 90 2 2 2.5 2.5 
81 77 4 4 5.5 5.5 


The sum of the negative ranks is 
—2.5 + (-4) = -65. 

The sum of the positive ranks is 
8+14+554+74+9+4+25 +55 = 385. 


The test statistic is the smaller absolute value of these two sums. Because 
|—6.5| < |38.5], the test statistic is w, = 6.5. Because the test statistic is less 
than the critical value, that is, 6.5 < 8, you reject the null hypothesis. 


Interpretation There is enough evidence at the 5% level of significance 
to support the claim that golfers can lower their scores by using the newly 
designed clubs. 


LD 
C2 
Meee) 


eee) Pieturing 
the World 


To help determine when knee 
arthroscopy patients can resume 
driving after surgery, the driving 
reaction times (in milliseconds) of 
10 right knee arthroscopy patients 
were measured before surgery 
and 4 weeks after surgery using a 
computer-linked car simulator. The 
table shows the results. (Adapted 
from Knee Surgery, Sports Traumatology, 
Arthroscopy Journal) 


Reaction 
Reaction time 
time 4 weeks 
before after 
Patient surgery surgery 
1 720 730 
2 750 645 
3 735 745 
4 730 640 
5 755 660 
6 T45 670 
7 730 650 
8 725 730 
9 770 675 
10 700 705 


At a = 0.05, can you conclude 
that the reaction times changed 
significantly four weeks after 
surgery? 


Study Tip 


Use the Wilcoxon 
signed-rank test 
for dependent samples and the 
Wilcoxon rank sum test for 
independent samples. 


SECTION 11.2. The Wilcoxon Tests 619 
TRY IT YOURSELF 1 


A quality control inspector wants to test the claim that a spray-on water 
repellent is effective. To test this claim, he selects 12 pieces of fabric, sprays 
water on each, and measures the amount of water repelled (in milliliters). He 
then applies the water repellent and repeats the experiment. The table shows 
the results. At a = 0.01, can he conclude that the water repellent is effective? 


Fabric 1)}2;3/4)5 )6;]7 4) 8); 9 | 10] 11) 12 
No repellent 8 YT | 2 4 6 | 10) 9 5) 9 |} 11] 8 4 
Repellent applied 15 12 11 6 6 8 + 8 +6 +12) 8 14) 8 


Answer: Page TI 


The Wilcoxon Rank Sum Test 


In Sections 8.1 and 8.2, you used a z-test (a; and oy known) or a t-test (0, and 
oy unknown) together with independent samples to determine whether there 
was a difference between two populations. To use a z-test or a f-test to test 
such a difference, you must assume (or know) that the samples are random and 
independent, and either the populations are normally distributed or each sample 
size is at least 30. But, what should you do when the normality and sample size 
assumptions cannot be made? You can still compare the populations using the 
Wilcoxon rank sum test. 


DEFINITION 


The Wilcoxon rank sum test is a nonparametric test that can be used to 


determine whether two independent samples were selected from populations 
having the same distribution. 


A requirement for the Wilcoxon rank sum test is that the sample size of each 
sample must be at least 10. When calculating the test statistic for the Wilcoxon 
rank sum test, let n; represent the sample size of the smaller sample and n 
represent the sample size of the larger sample. When the two samples have the 
same size, it does not matter which one is 7 or np. 

When calculating the sum of the ranks R, combine both samples and rank 
the combined data. Then sum the ranks for the smaller of the two samples. When 
the two samples have the same size, you can use the ranks from either sample, 
but you must use the ranks from the sample you associate with nj. 


Test Statistic for the Wilcoxon Rank Sum Test 


For two independent samples, the test statistic z for the Wilcoxon rank sum 
test is 


_ R- wr 
z= 
oR 
where R is the sum of the ranks for the smaller sample, 
a ny(ny “+ ng “++ 1) 
2) ? 


nNyNy(ny + ny + 1) 
12 , 


620 CHAPTER 11. Nonparametric Tests 


GUIDELINES 


Performing a Wilcoxon Rank Sum Test 
In Words In Symbols 


. Verify that the samples are random 
and independent. 


. Identify the claim. State the null State Hp and H,. 
and alternative hypotheses. 


. Specify the level of significance. Identify a. 

. Determine the critical value(s) Use Table 4 in Appendix B. 
and the rejection region(s). 

. Determine the sample sizes. 


. Find the sum of the ranks for the 
smaller sample. 
a. List the combined data in 
ascending order. 
b. Rank the combined data. 
c. Add the sum of the ranks for 
the smaller sample, 7. 


. Find the test statistic and sketch 
the sampling distribution. 


. Make a decision to reject or fail If z is in the rejection region, 
to reject the null hypothesis. then reject Hp. Otherwise, 
fail to reject Ho. 


. Interpret the decision in the 
context of the original claim. 


Performing a Wilcoxon Rank Sum Test 


The table shows the earnings (in thousands of dollars) of a random sample 
of 10 male and 12 female pharmaceutical sales representatives. At a = 0.10, 
can you conclude that there is a difference between the males’ and females’ 
earnings? 


Male earnings 78 93 114 101 98 | 94 86) 95 | 117 | 99 
Female earnings 86 77 101 93 85 98 91 87) 84 97 100. 90 


SOLUTION 
The claim is “there is a difference between the males’ and females’ earnings.” 
To test this claim, use the null and alternative hypotheses below. 
Hy: There is no difference between the males’ and the females’ earnings. 
H,; There is a difference between the males’ and the females’ earnings. 
(Claim) 


Because the test is a two-tailed test with a = 0.10, the critical values are 
—Z) = —1.645 and z) = 1.645. The rejection regions are z < —1.645 and 
z > 1.645. 


SECTION 11.2 The Wilcoxon Tests 621 


The sample size for men is 10 and the sample size for women is 12. Because 
10 < 12, n, = 10 and nz = 12. Before calculating the test statistic, you must 
find the values of R, wr, and or. The table shows the combined data listed in 
ascending order and the corresponding ranks. 


Ordered data Sample Rank 


Study Tip | = ; 

Remember that in the 

case of a tie between 78 M 2 

data entries, use the 84 1 3 

average of the 85 F 4 

corresponding ranks. 26 M 55 
86 F 5.5 
87 F 7 
90 F 8 
91 F 9 
93 M 10.5 
93 F 10.5 
94 M 12 
95 M 13 
97 F 14 
98 M 15.5 
98 F 15.5 
99 M 17 
100 F 18 
101 M 19.5 
101 F 19.5 
114 M 21 
117 M 22 


Because the smaller sample is the sample of males, R is the sum of the male 
rankings. 


R=2+55+4+ 105+ 12+ 13+ 15.5 +17 + 19.5 + 21 + 22 
= 138 
Using n, = 10 and n, = 12, you can find wp and op as follows. 


n(n, + ny + 1) 


LR = ) 
_ 10(10 + 12 + 1) 
2 
_ 230 
2 


= 115 


622 CHAPTER 11. Nonparametric Tests 


N4Ny( a + > + 1) 


= V230 


15.17 


N 


Ee IiceacaaT 


When R = 138, wr = 115, and og ~ 15.17, the test statistic is 


R—- br 
OR 

138 — 115 
15.17 

= 1.52. 


The figure shows the location of the rejection regions and the test statistic z. 
Because z is not in the rejection region, you fail to reject the null hypothesis. 


ly = 
50 = 0.05 


Be a / \ 


1.52 


-3 


—Zy=- 


29 = 1.645 


Interpretation There is not enough evidence at the 10% level of significance 
to conclude that there is a difference between the males’ and females’ earnings. 


TRY IT YOURSELF 2 


You are investigating the automobile insurance claims paid (in thousands of 
dollars) by two insurance companies. The table shows a random sample of 
12 claims paid by the two insurance companies. At a = 0.05, can you conclude 
that there is a difference in the claims paid by the companies? 


Company A 
Company B 


Company A 
Company B 


6.2 
73 


9.9 
10.8 


10.6 
5.6 


3.0 
4.1 


2.5 
3.4 


5.8 
1.7 


4.5 
1.8 


3.9 
3.0 


6.5 
2.2 


6.0 
4.4 


74 
4.7 


6.3 
5.3 


Answer: Page T1 


SECTION 11.2 The Wilcoxon Tests 623 


14.2 EXERCISES 


For Extra Help: MyLab Statistics 


Building Basic Skills and Vocabulary 


1. How do you know whether to use a Wilcoxon signed-rank test or a Wilcoxon 
rank sum test? 


2. What is the requirement for the sample size of each sample when using the 
Wilcoxon rank sum test? 


Using and Interpreting Concepts 


Performing a Wilcoxon Test Jn Exercises 3-8, 
(a) identify the claim and state Hj and H,. 


(b) decide whether to use a Wilcoxon signed-rank test or a Wilcoxon rank 
sum test. 

(c) find the critical value(s). 

(d) find the test statistic. 

(e) decide whether to reject or fail to reject the null hypothesis. 

(f) interpret the decision in the context of the original claim. 


ad} 3. Calcium Supplements and Blood Pressure In a study testing the 
effects of calcium supplements on blood pressure in men, 12 men were 
randomly chosen and given a calcium supplement for 12 weeks. The 
table shows the measurements for each subject’s diastolic blood pressure 
taken before and after the 12-week treatment period. At a = 0.01, 
can you reject the claim that there was no reduction in diastolic blood 
pressure? (Adapted from The Journal of the American Medical Association) 


Patient 1 2 3 4 5 6 


Before treatment 108 109 120 129 | 112) 111 
After treatment 99 | 115 | 105 | 116 | 115 | 117 


Patient 7 8 9 10 11 12 
Before treatment 117 135 | 124 118 130°) 115 
After treatment 108 122 | 120 126 128 | 106 


Be 4. Wholesale Trade and Manufacturing A private industry analyst claims 
that there is no difference in the salaries earned by workers in the 
wholesale trade and manufacturing industries. The table shows the 
salaries (in thousands of dollars) of a random sample of 10 wholesale 
trade workers and 10 manufacturing workers. At a = 0.10, can you 
reject the analyst’s claim? (Adapted from U.S. Bureau of Economic Analysis) 


Wholesale trade 70 66 65 | 80. 62. 69 | 7377 > 74°) 72 


Manufacturing 71 67» «56 | 74° «54. 65 76 | 58 64 | 52 


624 CHAPTER 11. Nonparametric Tests 


eB 5. Earnings by Degree A college administrator claims that there is a 
difference in the earnings of people with bachelor’s degrees and those 
with advanced degrees. The table shows the earnings (in thousands 
of dollars) of a random sample of 11 people with bachelor’s degrees 
and 10 people with advanced degrees. At a = 0.05, is there enough 
evidence to support the administrator’s claim? (Adapted from U.S. 
Census Bureau) 


Bachelor’s degree 62 58 71 84 78 58 52 64 68 60° 62 
Advanced degree 88 91 99 85 90 91 98 | 98 95 87 


6. Headaches A medical researcher wants to determine whether a new drug 
affects the number of headache hours experienced by headache sufferers. To 
do so, the researcher randomly selects seven patients and asks each to give 
the number of headache hours (per day) each experiences before and after 
taking the drug. The table shows the results. At a = 0.05, can the researcher 
conclude that the new drug affects the number of headache hours? 


Patient 1 2 3 4 5 6 7 


Headache hours (before) 0.8 2.4 2.8 2.6 2.7. 0.9 1.2 
Headache hours (after) 16 13/16 = 14 #15 = 16) 1.7 


Be 7. Teacher Salaries A teacher’s union representative claims that there is a 
difference in the salaries earned by teachers in Wisconsin and Michigan. 
The table shows the salaries (in thousands of dollars) of a random 
sample of 11 teachers from Wisconsin and 12 teachers from Michigan. 
At a = 0.05, is there enough evidence to support the representative’s 
claim? (Adapted from National Education Association) 


Wisconsin 55 59) 49 | 56 | S51) 61 | 55) 61 | 53 | 47 | 52 
Michigan 61 65 | 55 62 > 57. 67. 61-67: | “559 _-| 53-| «58 | 76 


[ lad] 8. Heart Rate A physician wants to determine whether an experimental 
medication affects an individual’s heart rate. The physician randomly 
selects 15 patients and measures the heart rate of each. The subjects then 
take the medication and have their heart rates measured after one hour. 
The table shows the results. At a = 0.05, can the physician conclude that 
the experimental medication affects an individual’s heart rate? 

Patient (|) De | a Se ae eS 
Heart rate (before) 72 81 | 75 | 76 79 | 74° 65 | 67 


Heart rate (after) 73 80 75 79 | 74 #76 73 | 67 


Patient 9 10 1112 13 14° «15 
Heart rate (before) 76 83 66 75 76 78 68 
Heart rate (after) 74 #77 70 \>77 76) 75 74 


SECTION 11.2 TheWilcoxon Tests 625 
Extending Concepts 
Wilcoxon Signed-Rank Test for n > 30 When you are performing a 


Wilcoxon signed-rank test and the sample size n is greater than 30, you can use the 
Standard Normal Table and the formula below to find the test statistic. 


n(n + 1) 
Ws; — 4 
_— 
fe +1)(2n +1) 
24 


In Exercises 9 and 10, perform the Wilcoxon signed-rank test using the test statistic 
forn > 30. 


eB 9. Fuel Additive A petroleum engineer wants to know whether a certain 
fuel additive improves a car’s gas mileage. To decide, the engineer 
records the gas mileages (in miles per gallon) of 33 randomly selected 
cars with and without the fuel additive. The table shows the results. At 
a = 0.10, can the engineer conclude that the gas mileage is improved? 
Car 1 2 3 4 5 6 7 8 9 10 11 
Without additive 36.4 36.4 36.6 36.6 36.8 36.9 | 37.0 | 37.1 | 37.2 | 37.2 36.7 


With additive 36.7 36.9 37.0 37.5 38.0 38.1 | 38.4 | 38.7 38.8 | 38.9 | 36.3 


Car 12 | 13 14 15 16 17 18 19 | 20 | 21 22 
Without additive 37.5 37.6 37.8 | 37.9 | 37.9 38.1 38.4 40.2 40.5 40.9 35.0 
With additive 38.9 39.0 39.1 39.4 39.4 39.5 | 39.8 | 40.0 40.0 | 40.1 | 36.3 


Car 23 24 25 26 27 28 29 30 31 32 33 

Without additive 32.7 33.6 34.2 | 35.1 | 35.2) 35.3 35.5 | 35.9 | 36.0 | 36.1 | 37.2 

With additive 32.8 34.2 | 34.7 | 34.9 34.9 | 35.3 35.9 | 36.4 | 36.6 | 36.6 | 38.3 

eB 10. Fuel Additive A petroleum engineer claims that a fuel additive 

improves gas mileage. The table shows the gas mileages (in miles per 
gallon) of 32 randomly selected cars measured with and without the fuel 
additive. Test the petroleum engineer’s claim at a = 0.05. 

Car 1 2 3 4 5 6 7 8 9 10 11 

Without additive 34.0 34.2 34.4 344 | 34.6 34.8 35.6) 35.7 | 30.2 31.6 32.3 

With additive 36.6 | 36.7 | 37.2 | 37.2 | 37.3 | 37.4 | 37.6 | 37.7 | 34.2 | 34.9 | 34.9 


Car 12) 13 14 15 16 17 18 | 19 | 20 | 21 22 
Without additive 33.0 33.1 33.7 | 33.7 | 33.8 35.7 36.1 36.1 36.6 36.6 36.8 
With additive 34.9 35.7 36.0 36.2 36.5 37.8 38.1 | 38.2 38.3 | 38.3 38.7 


Car 23. | 24 |) 25 | 26 | 27 | 28 | 29 | 30 | 31 32 
Without additive 37.1 37.1 37.2 | 37.9 | 37.9 | 38.0 | 38.0 | 38.4 | 38.8 42.1 
With additive 38.8 | 38.9 | 39.1 | 39.1 39.2 39.4 39.8 40.3 40.8 43.2 


College Hanks 


Each year, Forbes and the Center for College Affordability and Productivity 
(CCAP) release a list of the best colleges in the United States. Over 600 colleges 
and universities are ranked according to factors that fall into one of five categories. 


1. Postgraduate success, which is based on salary of alumni by school and the 
alumni who appear on CCAP’s America’s Leaders list 


2. Student debt, which is based on three components: average federal student 
loan debt load, student loan default rates, and predicted versus actual percent 
of students taking federal loans 


3. Student satisfaction, which is based on student retention rates and student 
evaluations of professors 


4. Graduation rate, which is based on how many students actually finish their 
degrees in four years and the actual versus predicted rate 


5. Academic success, which is based on students who have won competitive 
scholarships and fellowships, and students who have gone on to earn Ph.D.s 


The table shows the student populations for randomly selected colleges by region 
on the 2016 list. 


Student populations 


Northeast Midwest South West 
1,805 24,766 6,621 1,498 


9,181 2,948 14,769 1,394 
14,317 1,459 = 29,175 1,144 
2,113 3,688 15,984 8,132 
20,445 3,418 2,850 | 12,820 


1,632 14,747 = 27,511 | 50,320 
5,123 14,906 24,932 31,354 
755 5,931 | 49,610 | 2,127 
15,117 2,791 10,033 19,934 
18,090 11,458 1,575 | 31,332 


EXERCISES 


1. Construct a side-by-side box-and-whisker plot for 4. The median student population at a college in the 
the four regions. Do any of the median student South is 10,000. 
populations appear to be the same? Do any 


appear to be different? 5. The median student population at a college in the 


West is different from 8000. 


In Exercises 2-5, use the sign test to test the claim. 


Whatcaryoucenclude? Use w= 005. In Exercises 6 and 7, use the Wilcoxon rank sum test 


to test the claim. Use a = 0.01. 
2. The median student population at a college in the 


Northeast is less than or equal te 7000. 6. There is no difference between student 


populations for colleges in the Midwest and 
3. The median student population at a college in the colleges in the West. 


BIS eSEIS cE Cater nor equal toc0nt): 7. There is a difference between student populations 


fer-eolleges-in_the Northeast -and_colleges-in the 
South: 


626 CHAPTER 11. Nonparametric Tests 


SECTION 11.3. The Kruskal-Wallis Test 627 


What You Should Learn 


» How to use the Kruskal-Wallis 
test to determine whether three 
or more samples were selected 
from populations having the 
same distribution 


The Kruskal-Wallis Test 


The Kruskal-Wallis Test 


In Section 10.4, you learned how to use one-way ANOVA techniques to compare 
the means of three or more populations. When using one-way ANOVA, you 
should verify that each independent sample is selected from a population that is 
normally, or approximately normally, distributed. When you cannot verify that 
the populations are normal, you can still compare the distributions of three or 
more populations. To do so, you can use the Kruskal-Wallis test. 


DEFINITION 


The Kruskal-Wallis test is a nonparametric test that can be used to determine 


whether three or more independent samples were selected from populations 
having the same distribution. 


For a Kruskal-Wallis test, the null and alternative hypotheses are always 
similar to these statements. 


Hy: All of the populations have the same distribution. 


H,: At least one population has a distribution that is different from 
the others. 


The conditions for using the Kruskal-Wallis test are that the samples must be 
random and independent, and the size of each sample must be at least 5. If these 
conditions are met, then the sampling distribution for the Kruskal-Wallis test 
is approximated by a chi-square distribution with k — 1 degrees of freedom, 
where k is the number of samples. You can calculate the Kruskal-Wallis test 
statistic using the formula below. 


Test Statistic for the Kruskal-Wallis Test 


For three or more independent samples, the test statistic for the Kruskal-Wallis 
test is 


H= 


12 (2 RS Re 
+ + eee + = 
N(N + 1) 


)-30 +1) 


ny nN Nk 
where 

k is the number of samples, 

n; is the size of the ith sample, 

N is the sum of the sample sizes, 
and 


R; is the sum of the ranks of the ith sample. 


Performing a Kruskal-Wallis test consists of combining and ranking the 
sample data. The data are then separated according to sample and the sum of the 
ranks of each sample is calculated. 


628 


CHAPTER 11 


Nonparametric Tests 


These sums are then used to calculate the test statistic H, which is an 
approximation of the variance of the rank sums. When the samples are selected 
from populations having the same distribution, the sums of the ranks will be 
approximately equal, H will be small, and you should fail to reject the null 
hypothesis. 

When the samples are selected from populations not having the same 
distribution, the sums of the ranks will be quite different, H will be large, and you 
should reject the null hypothesis. 

Because you only reject the null hypothesis when H is significantly large, the 
Kruskal-Wallis test is always a right-tailed test. 


GUIDELINES 


Performing a Kruskal-Wallis Test 
In Words In Symbols 


. Verify that the samples are random 
and independent, and each sample 
size is at least 5. 


. Identify the claim. State the null State Hp and H,. 
and alternative hypotheses. 


. Specify the level of significance. Identify a. 


. Identify the degrees df.=k-1 
of freedom. 


. Determine the critical value Use Table 6 in Appendix B. 
and the rejection region. 


. Find the sum of the ranks for 
each sample. 


a. List the combined data in 
ascending order. 
b. Rank the combined data. 
. Find the test statistic and sketch 12 


the sampling distribution. u N(N +1) 


Re RB 
(2.2 
ny no 


—3(N +1) 


. Make a decision to reject or fail If H is in the rejection region, 
to reject the null hypothesis. then reject Hp. Otherwise, fail 
to reject Hp. 


. Interpret the decision in the 
context of the original claim. 


Ordered 
data 


44 
45 
48 
49 
50 
50 
51 
52 
52 
53 


SECTION 11.3 The Kruskal-Wallis Test 629 


Performing a Kruskal-Wallis Test 


You want to compare the numbers of crimes reported in three police precincts 
in a city. To do so, you randomly select 10 weeks for each precinct and record 
the numbers of crimes reported. The table shows the results. At a = 0.01, can 
you conclude that the distribution of the numbers of crimes reported in at least 
one precinct is different from the others? 


Number of crimes reported for the week 


101st Precinct 106th Precinct 113th Precinct 


(Sample 1) (Sample 2) (Sample 3) 
60 65 69 
52 55 51 
49 64 70 
52 66 61 
50 53 67 
48 58 65 
a7 50 62 
45 54 59 
44 70 60 
56 62 63 


SOLUTION 


You want to test the claim that the distribution of the numbers of crimes 
reported in at least one precinct is different from the others. The null and 
alternative hypotheses are as follows. 


Ho: The distribution of the numbers of crimes reported is the same in all 
three precincts. 


H,: The distribution of the numbers of crimes reported in at least one 
precinct is different from the others. (Claim) 


The test is a right-tailed test with a = 0.01 and df. =k -1=3-1=2. 
From Table 6 in Appendix B, the critical value is Xo = 9.210. The rejection 
region is x? > 9.210. Before calculating the test statistic, you must find the 
sum of the ranks for each sample. The table shows the combined data listed in 
ascending order and the corresponding ranks. 


Ordered Ordered 
Sample Rank data Sample Rank data Sample Rank 
101st 1 54 106th 11 62 113th 20.5 
101st 2 55 106th 12 63 113th 22 
101st 3 56 101st 13 64 106th 23 
101st 4 57 101st 14 65 106th 24.5 
101st Bye) 58 106th 15 65 113th 24.5 
106th 5) 59 113th 16 66 106th 26 
113th 7 60 101st 17.5 67 113th 27 
101st 8.5 60 113th WS 69 113th 28 
101st 8.5 61 113th 19 70 106th 29.5 


106th 10 62 106th 20.5 70 113th 29.5 


630 CHAPTER 11. Nonparametric Tests 

ZR 

=) Picturing 
the World 


The randomly collected data 
below were used to compare the 
water temperatures (in degrees 
Fahrenheit) of cities bordering 
the Gulf of Mexico. (Adapted from 
National Oceanographic Data Center) 


The sum of the ranks for each sample is as follows. 
R=14+24+34+44+554 854+ 854+ 134+ 144175 =77 
Ry =5.5 + 10+ 11 + 12 + 15 + 20.5 + 23 + 24.5 + 26 + 29.5 = 177 
R; = 7+ 16+ 17.5 + 19 + 20.5 + 22 + 24.5 + 27 + 28 + 29.5 = 211 


Using these sums and the values n; = 10, ny = 10,3 = 10, and N = 30, the 
test statistic is 
12 (= 7 ,. 21" 
= “ - 
30(30 + 1)\ 10 10 10 


) — 3(30 +1) = 12.521. 


The figure shows the location of the rejection region and the test statistic H. 


ele | Uae | Wan Because H is in the rejection region, you reject the null hypothesis. 


Key, Island, Island, 
FL LA AL 
(Sample 1) (Sample 2) (Sample 3) 
62 51 63 
69 55 51 
77 57 54 
59 63 60 ote 
60 74 715 
2 
75 82 80 2 4 6 ee 12\ 14 
83 85 70 7 H=12.521 
65 60 78 Hg 3200 
a of ee Interpretation There is enough evidence at the 1% level of significance to 
86 76 84 support the claim that the distribution of the numbers of crimes reported in at 
82 83 least one precinct is different from the others. 
86 


TRY IT YOURSELF 1 


You want to compare the salaries of veterinarians who work in Texas, 
Florida, and California. To compare the salaries, you randomly select several 
veterinarians in each state and record their salaries. The table shows the 
salaries (in thousands of dollars). At a = 0.05, can you conclude that the 
distribution of the veterinarians’ salaries in at least one state is different from 
the others? (Adapted from U.S. Bureau of Labor Statistics) 


At a = 0.05, can you conclude 
that at least one temperature 
distribution is different from the 
others? 


Sample salaries (in thousands of dollars) 


TX FL CA 
(Sample 1) | (Sample 2) (Sample 3) 
85.3 143.3 111.3 
149.9 135.9 83.4 
97.9 121.6 126.8 
91.0 80.4 146.1 
89.6 116.6 154.0 
147.7 106.7 160.2 
63.3 84.7 57.6 
74.8 95.0 #13:2 
118.7 105.3 131.0 
101.1 


Answer: Page T1 


SECTION 11.3 The Kruskal-Wallis Test 631 


11.3 EXERCISES 


For Extra Help: MyLab Statistics 


Building Basic Skills and Vocabulary 


1. What are the conditions for using a Kruskal-Wallis test? 


2. Explain why the Kruskal-Wallis test is always a right-tailed test. 


Using and Interpreting Concepts 


Performing a Kruskal-Wallis Test Jn Exercises 3—6, (a) identify the claim 
and state Hj and H,, (b) find the critical value and identify the rejection region, 
(c) find the test statistic H, (d) decide whether to reject or fail to reject the null 
hypothesis, and (e) interpret the decision in the context of the original claim. 


BG 3. Home Insurance The table shows the annual premiums for a random 


Q 


Q 


sample of home insurance policies in Connecticut, Massachusetts, 
and Virginia. At a = 0.05, can you conclude that the distribution 
of the annual premiums in at least one state is different from the 
others? (Adapted from National Association of Insurance Commissioners) 


State Annual premium (in dollars) 
Connecticut 1303 1098 | 1263 | 1413-1538) 1179 | 1320 
Massachusetts | 1382 | 1302 1257. 1572 1387. 1166 1034 
Virginia 1035 950 766 845 1132 838 755 


. Hourly Rates A researcher wants to determine whether there is a 


difference in the hourly pay rates for registered nurses in Indiana, 
Kentucky, and Ohio. The researcher randomly selects several registered 
nurses in each state and records the hourly pay rate for each. The table 
shows the results. At a = 0.05, can the researcher conclude that the 
distribution of the hourly pay rates of registered nurses in at least one state 
is different from the others? (Adapted from U.S. Bureau of Labor Statistics) 


State Hourly pay rate (in dollars) 
Indiana 28.83 29.28 | 27.68 | 28.43 31.27 | 26.13 30.47 
Kentucky 27.77 26.40 28.92 | 31.02 29.37 | 32.42 25.42 
Ohio 27.84 | 32.24 33.64 | 33.91 | 27.34 | 29.89 


. Annual Salaries The table shows the annual salaries for a random sample 


of private industry workers in Kentucky, North Carolina, South Carolina, 
and West Virginia. At a = 0.10, can you conclude that the distribution 
of the annual salaries of private industry workers in at least one state is 
different from the others? (Adapted from U.S. Bureau of Labor Statistics) 


State Annual salary (in thousands of dollars) 
Kentucky 39.9 41.6 50.5 62.1 38.3 32.9 39.9 
North Carolina | 48.8 | 47.2 | 41.9 59.6 | 40.8 | 44.9 48.8 
South Carolina 35.4 43.0 49.1 485 40.3 41.7. 35.4 
West Virginia 34.8 45.9 | 36.6 45.1 | 50.3 | 38.1 348 


632 


CHAPTER 11 


Nonparametric Tests 


BG 6. Caffeine Content The table shows the amounts of caffeine (in 


milligrams) in 16-ounce servings for a random sample of beverages. 
At a = 0.01, can you conclude that the distribution of the amounts of 
caffeine in at least one beverage is different from the others? (Adapted 


from Center for Science in the Public Interest) 


Amount of caffeine in 16-ounce serving 


Beverage (in milligrams) 
Coffees 320 300 206 150 | 266 
Soft drinks 95 96 56 Sl 71 72° 47 
Energy drinks | 200 141 160 152. 154. 166 
Teas 100 106 42 15 32 10 


Extending Concepts 


Comparing Two Tests Jn Exercises 7 and 8, 


(a) perform a Kruskal-Wallis test. 


(b) perform a one-way ANOVA test, assuming that each population is normally 


distributed and the population variances are equal. 


(c) Compare the results. 


7% 7. Hospital Patient Stays An insurance underwriter claims that the 


NY 


number of days patients spend in the hospital is different in at least 
one region of the United States. The table shows the numbers of days 
randomly selected patients spent in the hospital in four U.S. regions. 
At a = 0.01, can you support the underwriter’s claim? (Adapted from 
U.S. National Center for Health Statistics) 


Region Number of days 
Northeast 8 6 6,35) 11 3°) 8 
Midwest 5/4)/3);9]1 6/3) 4 
South 5/8/1})/5)/8/7)5)]1 
West 2/3)6)61]5 3.6 5 


. Energy Consumption The table shows the energy consumed (in 


millions of Btu) in one year for a random sample of households from four 
USS. regions. At a = 0.01, can you conclude that the energy consumed 
is different in at least one region? (Adapted from U.S. Energy Information 
Administration) 


Region Energy consumed (in millions of Btu) 
Northeast 61 95 | 140 127) 93 97. 84, 123 89 | 163 
Midwest 59 158 | 169 | 140 | 95 187. 123 104 88 | «37. 72 
South 86 35 67 86) 142 69 65. 62 
West 81 39 85) 35) 113 46 125 70 77) 63 


SECTION 11.4 Rank Correlation 633 


What You Should Learn 


» How to use the Spearman 
rank correlation coefficient 
to determine whether the 
correlation between two 
variables is significant 


The Spearman Rank Correlation Coefficient 


The Spearman Rank Correlation Coefficient 


In Section 9.1, you learned howto measure the strength of the relationship between 
two variables using the Pearson correlation coefficient r. Two requirements for 
the Pearson correlation coefficient are that the variables are linearly related and 
that the variables have a bivariate normal distribution. When these requirements 
cannot be met, you can examine the relationship between two variables using the 
nonparametric equivalent to the Pearson correlation coefficient—the Spearman 
rank correlation coefficient. 

The Spearman rank correlation coefficient has several advantages over the 
Pearson correlation coefficient. For instance, the Spearman rank correlation 
coefficient can be used to describe the relationship between linear or nonlinear 
data. The Spearman rank correlation coefficient can be used for data at the 
ordinal level. And, the Spearman rank correlation coefficient is easier to 
calculate by hand than the Pearson correlation coefficient. 


DEFINITION 


The Spearman rank correlation coefficient r, is a measure of the strength 
of the relationship between two variables. The Spearman rank correlation 
coefficient is calculated using the ranks of paired sample data entries. If there 
are no ties in the ranks of either variable, then the formula for the Spearman 
rank correlation coefficient is 


62d? 
n(n? = 1) 


r=1 


where v is the number of paired data entries and d is the difference between 
the ranks of a paired data entry. If there are ties in the ranks and the number 
of ties is small relative to the number of data pairs, then the formula can still 
be used to approximate /,. 


The values of r, range from —1 to 1, inclusive. When the ranks of 
corresponding data pairs are exactly identical, r, is equal to 1. When the ranks 
are in “reverse” order, r, is equal to —1. When the ranks of corresponding data 
pairs have no relationship, r, is equal to 0. 

After calculating the Spearman rank correlation coefficient, you can 
determine whether the correlation between the variables is significant. You 
can make this determination by performing a hypothesis test for the population 
correlation coefficient p,. The null and alternative hypotheses for this test are 
listed below. 


Ho: p; = 0 (There is no correlation between the variables.) 
H,: p; ~ 0 (There is a significant correlation between the variables.) 


Table 10 in Appendix B lists the critical values for the Spearman rank 
correlation coefficient for selected levels of significance and sample sizes. The 
test statistic for the hypothesis test is the Spearman rank correlation coefficient r,. 


634 CHAPTER 11. Nonparametric Tests 


GUIDELINES 


Testing the Significance of the Spearman Rank Correlation Coefficient 
In Words In Symbols 


. Identify the claim. State the null State Hp and H,. 
and alternative hypotheses. 


. Specify the level of significance. Identify a. 


. Determine the critical value. Use Table 10 in Appendix B. 


6>d? 
. Find the test statistic. ro=1- 2: 


n(n? — 1) 
. Make a decision to reject or fail If |r,| is greater than the 


to reject the null hypothesis. critical value, then reject Hp. 
Otherwise, fail to reject Hp. 


. Interpret the decision in the 
context of the original claim. 


The Spearman Rank Correlation Coefficient 


The table shows the school enrollments of males and females for a random 
sample of 10 colleges. At a = 0.05, can you conclude that there is a significant 
correlation between the number of males and the number of females enrolled 
at a college? 


Male Female 


1786 2182 
4246 4415 
1419 1537 
1188 1236 
2394 2182 
1079 919 
4049 4209 
3595 3741 
1102 1086 
1345 1282 


SOLUTION 


The claim is “there is a significant correlation between the number of males 
and the number of females enrolled at a college.” The null and alternative 
hypotheses are listed below. 


Hy: p; = 0 (There is no correlation between the number of males and the 
number of females enrolled at a college.) 


Hi: ps # 0 (There is a significant correlation between the number of males 
and the number of females enrolled at a college.) (Claim) 


Study Tip 


Remember that in the 
case of a tie between 
data entries, use the 
average of the 

corresponding ranks. 


<2 
Mee 


eee) Picturing 
the World 


The table shows the retail prices 
(in dollars per pound) for ground 
beef and fresh whole chicken for 
a random sample of nine U.S. 
grocery stores. (Adapted from U.S. 
Bureau of Labor Statistics) 


Beef Chicken 


3.69 1.44 
3.66 1.42 
3.65 1.48 
3.68 1.50 
3.60 1.47 
3.55 1.46 
3.55 1.41 
3.56 1.47 
3.59 1.46 


Does a significant correlation exist 
between ground beef and chicken 
prices in U.S. grocery stores? Use 

a = 0.10. 


SECTION 11.4 Rank Correlation 635 


Each data set has 10 entries. Because a = 0.05 and n = 10, the critical value 
is 0.648. Before calculating the test statistic, you must find }d?, the sum of the 
squares of the differences of the ranks of the data sets. You can use a table to 
calculate }d7, as shown below. 


Male Rank Female Rank d d? 
1786 6 2182 6.5 —0.5 0.25 
4246 10 4415 10 0) 0 
1419 5 1537 5 0 0 
1188 3 1236 0 0 
2394 7 2182 6.5 0.5 0.25 
1079 il 919 ut 0 0 
4049 9 4209 9 0 0 
3595 8 3741 8 0 0 
1102 2 1086 2 0 0 
1345 4 1282 4 0 0 
Xd? = 05 


When n = 10 and Sd? = 0.5, the test statistic is 


6d? 
r, = 1 - —> 
n(n? — 1) 
7 6(0.5) 
10(107 — 1) 
~ 0.997. 


Because |r,| ~ 0.997 > 0.648, you reject the null hypothesis. 


Interpretation There is enough evidence at the 5% level of significance to 
conclude that there is a significant correlation between the number of males 
and the number of females enrolled at a college. 


TRY IT YOURSELF 1 


The table shows the prices (in dollars per bushel) received for oat and wheat 
for arandom sample of seven U.S. farmers. At a = 0.10, can you conclude that 
there is a significant correlation between the oat and wheat prices? (Adapted 
from U.S. Department of Agriculture) 


Oat Wheat 
1.84 3.67 
1.97 3.49 
2.03 3.68 
2.25 3.88 
2.35 3.91 
2.31 4.02 
2.40 4.15 


Answer: Page T1 


636 CHAPTER 11. Nonparametric Tests 


11.4 EXERCISES 


For Extra Help: MyLab Statistics 


Building Basic Skills and Vocabulary 


1. What are some advantages of the Spearman rank correlation coefficient over 
the Pearson correlation coefficient? 


2. Describe the ranges of the Spearman rank correlation coefficient and the 
Pearson correlation coefficient. 


3. What does it mean when r, is equal to 1? What does it mean when ,, is equal 
to —1? What does it mean when 7, is equal to 0? 


4. Explain, in your own words, what r, and p, represent in Example 1. 


Using and Interpreting Concepts 


Testing a Claim Jn Exercises 5—8, (a) identify the claim and state Hp) and H,, 
(b) find the critical value, (c) find the test statistic r,, (d) decide whether to reject 
or fail to reject the null hypothesis, and (e) interpret the decision in the context of 
the original claim. 


5. Farming Expenses In an agricultural report, a commodities analyst claims 
that there is a significant correlation between purchased seed expenses and 
fertilizer and lime expenses in the farming business. The table shows the 
total purchased seed expenses and fertilizer and lime expenses for farms in 
eight randomly selected states for a recent year. At a = 0.05, is there enough 
evidence to support the analyst’s claim? (Source: U.S. Department of Agriculture) 


Purchased seed Fertilizer and 

expenses (in lime expenses (in 

State millions of dollars) _ millions of dollars) 
Arkansas 490 480 
California 1530 2060 
Florida 490 480 
Kentucky 266 402 
Michigan 741 642 
North Carolina 380 470 
Ohio 879 858 
Washington 360 560 


6. Exercise Machines The table shows the overall scores and the prices for 
a random sample of nine different models of elliptical exercise machines. 
The overall score represents the ergonomics, exercise range, ease of use, 
construction, heart-rate monitoring, and safety. At a = 0.05, can you 
conclude that there is a significant correlation between the overall score and 
the price? (Source: Consumer Reports) 


Overall score 77 75 73 71 


Price (in dollars) 3700 | 1700 1300 = 900 


Overall score 66 66 64 62 58 
Price (in dollars) 1000 1400 1800 1000 700 


7. 


SECTION 11.4 Rank Correlation 637 


Crop Prices The table shows the prices (in dollars per bushel) received for 
barley and corn for a random sample of nine U.S. farmers. At a = 0.05, can 
you conclude that there is a significant correlation between the barley and 
corn prices? (Adapted from U.S. Department of Agriculture) 


Barley 4.89 452 485 497 512 4.91 5.08 4.98 4.87 
Corn 3.21 | 3.22 3.29 | 3.23 3.33 | 340 | 3.44 3.49 | 3.43 


BG 8. Vacuum Cleaners The table shows the overall scores and the prices 


for a random sample of 12 different models of vacuum cleaners. The 
overall score represents cleaning, airflow, handling, noise, and emissions. 
At a = 0.10, can you conclude that there is a significant correlation 
between the overall score and the price? (Source: Consumer Reports) 
Overall score 65 | 71 69 47 55 | 38 


Price (in dollars) 150 200 550 | 350 | 470 90 


Overall score 47 47 47 a7 34 65 
Price (in dollars) 80 130 210 ) 190 300 = —-260 


BG Test Scores and GNI Jn Exercises 9-12, use the table below. The 


11. 


12. 


table shows the average achievement scores of 15-year-olds in science and 
mathematics along with the gross national incomes (GNI) of nine randomly 
selected countries for a recent year. (The GNI is a measure of the total value 
of goods and services produced by the economy of a country.) (Source: 
Organization for Economic Cooperation and Development; The World Bank) 


Science Mathematics GNI 
Country average average (in billions of dollars) 
Canada 528 516 1,529 
France 495 493 2,458 
Germany 509 506 3,437 
Italy 481 490 1,815 
Japan 538 532 4,549 
Mexico 416 408 1,143 
Spain 493 486 1,192 
Sweden 493 494 503 
United States 496 470 18,496 


Science and GNI At a = 0.10, can you conclude that there is a significant 
correlation between science achievement scores and GNI? 


Math and GNI At a = 0.10, can you conclude that there is a significant 
correlation between mathematics achievement scores and GNI? 


Science and Math Ata = 0.10, can you conclude that there is a significant 
correlation between science and mathematics achievement scores? 


Writing a Summary Use the results from Exercises 9-11 to write a 
summary about the correlation (or lack of correlation) between test scores 
and GNI. 


638 


CHAPTER 11 


Nonparametric Tests 


Extending Concepts 


Testing the Spearman Rank Correlation Coefficient for n > 30 
When you are testing the significance of the Spearman rank correlation coefficient 
and the sample size n is greater than 30, you can use the expression below to find 
the critical value. 


, z corresponds to the level of significance 


Vn—- 1 
In Exercises 13 and 14, test the Spearman rank correlation coefficient. 


eB 13. Work Injuries The table shows the average hours worked per 
week and the numbers of on-the-job injuries for a random sample of 
U.S. companies in a recent year. At a = 0.10, can you conclude that 
there is a significant correlation between average hours worked and the 
number of on-the-job injuries? 


Hours 


46 43 | 41 40 41 | 42 | 45: 45 | 42) «45 :| «44~— 44 
worked 


Injuries 22 25 18 17° 20) 22 | 28 | 29) 24 26 26) 25 


Hours 


45 | 46 | 47 | 47 | 46 | 46 | 49 | 50 | 50 | 42 | 41 | 42 
worked 


Injuries 27 29 29 30) 29) 29 | 30 | 30) 30) 23. 22 = 23 


Hours 


41 41 41 41 40) 39) 38 39 | 39 
worked 


Injuries 21 19 18 18 + 17° 16) 16 16) 16 


BG 14. Work Injuries in Construction The table shows the average hours 
worked per week and the numbers of on-the-job injuries for a random 
sample of U.S. construction companies in a recent year. At a = 0.05, 
can you conclude that there is a significant correlation between average 
hours worked and the number of on-the-job injuries? 


Hours 


38 | 38 37) 38) 38) «40 | 39) 39 | 339 | 40 39 | 41 
worked 


Injuries 11 11 9 10> 10/17) 15) 14 #14 #16 > 15) «17 


Hours 


41 | 42 | 41 | 41 | 41 | 42 | 42 | 42 | 42 | 41 | 41 *| 39 
worked 


Injuries 17 21 18 18) 18) 22) 21/19 21 18 > 17° 12 


Hours 


38 | 38 39 | 39) 36.) 37:| 36.) 37: 37: 337 | «37 
worked 


Injuries 12 11 1312 6,6 ,6/,/6,7 8) 7 


SECTION 11.5. The Runs Test 639 


The Runs Test 


What You Should Learn The Runs Test for Randomness 


» How to use the runs test to 


determine whether a data set is The Runs Test for Randomness 


random 


In obtaining a sample of data, it is important for the data to be selected randomly. 
But how do you know whether the sample data are truly random? One way to 
test for randomness in a data set is to use a runs test for randomness. 

Before using a runs test for randomness, you must first know how to 
determine the number of runs in a data set. 


DEFINITION 


A run is a sequence of data having the same characteristic. Each run is 
preceded by and followed by data with a different characteristic or by no data 
at all. The number of data in a run is called the length of the run. 


Finding the Number of Runs 


A liquid-dispensing machine has been designed to fill one-liter bottles. A 
quality control inspector decides whether each bottle is filled to an acceptable 
level and passes inspection (P) or fails inspection ( F'). Determine the number 
of runs for each sequence and find the length of each run. 


1PPPPPPPPFFFFFFFF 
2 PFPFPFPFPFPFPFPF 
3. PPFFFFPFFFPPPPPP 


SOLUTION 


1. There are two runs. The first 8 P’s form a run of length 8 and the first 8 F’s 
form another run of length 8, as shown below. 


PPPPRPPPP FFFFFFFF 
a J J 


a a - al 


1st run 2nd run 
2. There are 16 runs each of length 1, as shown below. 


P F P F P F P F P F P F P F P F 
ie Ld 


1st run 2nd run... ... l6th run 


3. There are 5 runs, the first of length 2, the second of length 4, the third of 
length 1, the fourth of length 3, and the fifth of length 6, as shown below. 


PP FFFF P FFF PPPPPP 
ie Ma et 


1st run 2ndrun 3rdrun 4thrun 5th run 


640 


CHAPTER 11 


Nonparametric Tests 


TRY IT YOURSELF 1 


A machine produces engine parts. An inspector measures the diameter of each 
engine part and determines whether the part passes inspection (P) or fails 
inspection (F'). The results are shown below. Determine the number of runs 
in the sequence and find the length of each run. 


PPPFPFPPPPFFPFPPFFFPPPFPPP 
Answer: Page T1 


When each value in a set of data can be categorized into one of two separate 
categories, you can use the runs test for randomness to determine whether the 
data are random. 


DEFINITION 


The runs test for randomness is a nonparametric test that can be used to 


determine whether a sequence of sample data is random. 


The runs test for randomness considers the number of runs in a sequence of 
sample data in order to test whether a sequence is random. When a sequence has 
too few or too many runs, it is usually not random. For instance, the sequence 


PPPPPPPPFFFFFFFF 
from Example 1, part 1, has too few runs (only 2 runs). The sequence 
PFPFPFPFPFPFPFPF 


from Example 1, part 2, has too many runs (16 runs). So, these sample data are 
probably not random. 

You can use a hypothesis test to determine whether the number of runs in a 
sequence of sample data is too high or too low. The runs test is a two-tailed test, 
and the null and alternative hypotheses are listed below. 


Ho: The sequence of data is random. 
H,: The sequence of data is not random. 


When using the runs test, let n; represent the number of data that have 
one characteristic and let mn, represent the number of data that have the 
second characteristic. It does not matter which characteristic you choose to be 
represented by n,. Let G represent the number of runs. 


n, = number of data with one characteristic 
n, = number of data with the other characteristic 
G = number of runs 


Table 12 in Appendix B lists the critical values for the runs test for selected 
values of n, and n at the a = 0.05 level of significance. (In this text, you will 
use only the a = 0.05 level of significance when performing runs tests.) When 
n, OF Nz is greater than 20, you can use the standard normal distribution to find 
the critical values. 


SECTION 11.5 The Runs Test 641 


You can calculate the test statistic for the runs test as follows. 


Test Statistic for the Runs Test 


When 7, = 20 and n, = 20, the test statistic for the runs test is G, the 
number of runs. 


When n,; > 20 or nz > 20, the test statistic for the runs test is 


2nynz(2nyn2 — Ny — Nz) 
(my + nz)?(ny + ny — 1) 


GUIDELINES 


Performing a Runs Test for Randomness 
In Words In Symbols 


. Identify the claim. State the null State Hp and H,. 
and alternative hypotheses. 


. Specify the level of significance. Identify a. 
(Use a = 0.05 for the runs test.) 


. Determine the number of data that Determine 7, 12, and G. 
have each characteristic and the 
number of runs. 


. Determine the critical values. When n, S 20 and nz S 20, 
use Table 12 in Appendix B. 
When n, > 20 orn, > 20, use 
Table 4 in Appendix B. 


. Find the test statistic. When n, = 20 and n, = 20, 
use G. 
When n, > 20 or n, > 20, use 
_G = Be 
c= 
9G 
. Make a decision to reject or fail If G is less than or equal to 
to reject the null hypothesis. the lower critical value or 
greater than or equal to the 
upper critical value, then reject 
Hy. Otherwise, fail to reject Hp. 
Or, if z is in the rejection 
region, then reject Hp. 
Otherwise, fail to reject Hp. 
. Interpret the decision in the 
context of the original claim. 


642 CHAPTER 11. Nonparametric Tests 


Using the Runs Test 


As people enter a concert, an usher records where they are sitting. The results 
for 13 people are shown, where L represents a lawn seat and P represents 
a pavilion seat. At a = 0.05, can you conclude that the sequence of seat 
locations is not random? 


LLLPPLPPPLLPL 
SOLUTION 


The claim is “the sequence of seat locations is not random.” To test this claim, 
use the null and alternative hypotheses below. 
Ho: The sequence of seat locations is random. 


H,: The sequence of seat locations is not random. (Claim) 


To find the critical values, first determine n,, the number of L’s; 1, the number 
of P’s; and G, the number of runs. 


Lid PP L PPP LL P L 

a a a 
1st 2nd 3rd 4th Sth 6th 7th 
run run run run run run Tun 

n, = number of L’s = 7 

n, = number of P’s = 6 

G = number of runs = 7 


Because n, = 20,n2 = 20, and a = 0.05, use Table 12 to find the lower critical 
value 3 and the upper critical value 12. The test statistic is the number of runs 
G = 7. Because the test statistic G is between the critical values 3 and 12, you 
fail to reject the null hypothesis. 


Interpretation There is not enough evidence at the 5% level of significance 
to support the claim that the sequence of seat locations is not random. So, it 
appears that the sequence of seat locations is random. 


TRY IT YOURSELF 2 


The genders of 15 students as they enter a classroom are shown below, where 
F represents a female and M represents a male. At a = 0.05, can you conclude 
that the sequence of genders is not random? 


MFFFMMFFMFMMFFF 
Answer: Page T1 


SECTION 11.5. The Runs Test 643 


Using the Runs Test 


You want to determine whether the selection of recently hired employees 
in a large company is random with respect to gender. The genders of 
36 recently hired employees are shown below, where F represents a female 
and M represents a male. At a = 0.05, can you conclude that the sequence of 
employees is not random? 


MMFFFFMMMMMMFFFFFMM 
MMMMMFFFMMMMFMMFM 


SOLUTION 


The claim is “the sequence of employees is not random.” To test this claim, use 
the null and alternative hypotheses below. 


Hy: The sequence of employees is random. 
H,: The sequence of employees is not random. (Claim) 


To find the critical values, first determine 1, the number of F’s; nz, the number 
of M’s; and G, the number of runs. 

MM FFFF MMMMMM 

aod ey_—",—i‘$ ~- / 


1st run 2nd run 3rd run 


FFFFF MMMMMMM 


—j—" ye e 


4th run 5th run 


FFF MMMM F MM F M 
ee Xu ee J ete So ore re 
6th 7th 8th 9th 10th 11th 
run run run run run run 


n, = number of F’s = 14 


number of M’s = 22 


ny 
G = number of runs = 11 


Because ny > 20, use Table 4 in Appendix B to find the critical values. Because 
the test is a two-tailed test with a = 0.05, the critical values are 


—Zy = —1.96. 
and 
Z = 1.96. 
Before calculating the test statistic, find the values of wo and dg, as follows. 


2n\ny 
aera a +1 
2(14) (22) 
14 + 22 
616 
=3 1 
= 18.11 


644 CHAPTER 11. Nonparametric Tests 


% 
be 


7) 


) Picturing 
the World 


The sequence shows the National 
Football League conference of 
each winning team for the first 51 
Super Bowls, where A represents 
the American Football Conference 
and N represents the National 
Football Conference. (Source: 
National Football League) 


NNAAANAAAAA 
NAAANNANNNN 
NNNNNNNNNAA 
NAANAAAANAN 
NNANAAA 


At a = 0.05, can you conclude 
that the sequence of conferences 
of Super Bowl winning teams is 
random? 


0G > 


Anca — My — Ny) 

(my, + nz)?(ny + ny — 1) 

_ a. — 14 - 22] 
(14 + 22)7(14 + 22 - 1) 


~ 2.81 
You can find the test statistic as follows. 
G — UG 
OG 
11 — 18.11 
2.81 
= —2,53 


The figure shows the location of the rejection regions and the test statistic z. 
Because z is in the rejection region, you reject the null hypothesis. 


“3 \-2\-1 0 1 /f2 3 
5 


Interpretation There is enough evidence at the 5% level of significance to 
support the claim that the sequence of employees with respect to gender is 
not random. 


TRY IT YOURSELF 3 


Let S represent a day in a small town in which it snowed and let N represent 
a day in the same town in which it did not snow. The snowfall results for the 
entire month of January are shown below. At a = 0.05, can you conclude that 
the sequence is not random? 


NNNSSNNSNSNNNNNS 
NSNSNNSNSSNNNNN 
Answer: Page T1 


When n or 7) is greater than 20, you can also use a P-value to perform a 


hypothesis test for the randomness of the data. In Example 3, you can calculate 
the P-value to be 0.0114. Because P < a, you reject the null hypothesis. 


SECTION 11.5 The Runs Test 645 


11.5 EXERCISES creer Ss 


Building Basic Skills and Vocabulary 


1. In your own words, explain why the hypothesis test discussed in this section 
is called the runs test. 


2. Describe the test statistic for the runs test when the sample sizes n; and n 
are less than or equal to 20 and when either 1, or n> is greater than 20. 


Using and Interpreting Concepts 


Finding the Number of Runs = /n Exercises 3—6, determine the number of 
runs in the sequence. Then find the length of each run. 


3. TFTFTTTFFFTF 
4,.UUDDUDUUDDUDUU 

(yj 5. MFMFMFFFFFFMMMFFMMMM 
(yj & AAABBBABBAAAAAABAABABB 


7. Find the values of n, and n> in Exercise 3. 
8. Find the values of n, and n in Exercise 4. 
9. Find the values of n, and n2 in Exercise 5. 


10. Find the values of n; and n in Exercise 6. 


Finding Critical Values Jn Exercises 11-14, use the sequence and Table 12 
in Appendix B to determine the number of runs that are considered too high and 
the number of runs that are considered too low for the data to be in random order. 


We TFTFTFTFTFTF 
122.§MFMMMMMMFFMM 

13. NSSSNNNNNSNSNSSNNN 

ad} 14.XXXXXXXYYYYYYYYYYYYYY 


Performing a Runs Test Jn Exercises 15-20, (a) identify the claim and state 
Hy and H,, (b) find the critical values, (c) find the test statistic, (d) decide whether 
to reject or fail to reject the null hypothesis, and (e) interpret the decision in the 
context of the original claim. Use a = 0.05. 


15. Coin Toss A coach records the results of the coin toss at the beginning of 
each football game for a season. The results are shown, where H represents 
heads and T represents tails. The coach claimed the tosses were not random. 
Test the coach’s claim. 


HATTTHTHHATTTTHTHH 


BG 16. Senate The sequence shows the majority party of the U.S. Senate 
after each election for a recent group of years, where R represents the 
Republican party and D represents the Democratic party. Can you 
conclude that the sequence is not random? (Source: U.S. Senate) 


RDDDRRRRRRRDDDDDDD 
RDDRDDDDDDDDDDDDDR 
RRDDDDRRRDRRDDDDRR 


646 


CHAPTER 11 


Nonparametric Tests 


BG 17. Baseball The sequence shows the Major League Baseball league of 
each World Series winning team from 1969 to 2016, where N represents 
the National League and A represents the American League. Can you 
conclude that the sequence of leagues of World Series winning teams is 
not random? (Source: Major League Baseball) 


NANAAANNAANNNNAA 
ANANANAAANANAAAN 
ANAANANANNNANAN 


BG 18. Number Generator A number generator outputs the sequence of 
digits shown, where O represents an odd digit and E represents an even 
digit. Test the claim that the digits were not randomly generated. 


OOOEEEEQOOOOOEEEE 
OOEEEEQOOOOEEEEOO 


BG 19. Dog Identifications A team of veterinarians record, in order, the 
genders of every dog that is microchipped at their pet hospital in one 
month. The genders of recently microchipped dogs are shown, where F 
represents a female and M represents a male. A veterinarian claims that 
the microchips are random by gender. Do you have enough evidence to 
reject the doctor’s claim? 


MMFMFFFFFMMMFF 
FMFFFFFMFFFMFFF 


BG 20. Golf Tournament A golf tournament official records whether each 
past winner is American-born (A) or foreign-born (F). The results are 
shown for every year the tournament has existed. Can you conclude 
that the sequence is not random? 


FFAFFAFFAFFAFFAFFAFF 
FFFFAFFAFFAFFAFFAFAF 
FAFFFFFAFFFFFAFFFA 


Extending Concepts 


Runs Test with Quantitative Data Jn Exercises 21-23, use the following 
information to perform a runs test. You can also use the runs test for randomness 
with quantitative data. First, calculate the median. Then assign a + sign to those 
values above the median and a — sign to those values below the median. Ignore 
any values that are equal to the median. Use a = 0.05. 


eB 21. Daily High Temperatures The sequence shows the daily high 
temperatures (in degrees Fahrenheit) for a city during the month of 
July. Test the claim that the daily high temperatures do not occur 
randomly. 


84 87 92 93 95 84 82 83 81 87 92 98 99 93 84 85 
86 92 91 95 84 92 83 81 87 92 98 89 93 84 85 


Be 22. Exam Scores The sequence shows the exam scores of a class based on 
the order in which the students finished the test. Test the claim that the 
scores occur randomly. 


83 94 80 76 92 89 65 75 82 87 90 91 81 99 97 72 
72 89 90 92 87 76 74 66 88 81 90 92 89 76 80 


23. Use technology to generate a sequence of 30 numbers from 1 to 99, inclusive. 
Test the claim that the sequence of numbers is not random. 


AND | Statistics in the Real World 


Uses 


Nonparametric Tests Before you could perform many of the hypothesis tests 
you learned about in previous chapters, you had to ensure that certain conditions 
about the population were satisfied. For instance, before you could perform 
a t-test, you had to verify that the population was normally distributed or the 
sample size was at least 30. One advantage of the nonparametric tests shown 
in this chapter is that they are distribution free. That is, they do not require 
any particular information about the population or populations being tested. 
Another advantage of nonparametric tests is that they are easier to perform than 
their parametric counterparts. This means that they are easier to understand 
and quicker to use. Nonparametric tests can often be used when data are at the 
nominal or ordinal level. 


Abuses 


Insufficient Evidence Stronger evidence is needed to reject a null hypothesis 
in a nonparametric test than in a corresponding parametric test. That is, when 
you are trying to support a claim represented by the alternative hypothesis, you 
might need a larger sample when performing a nonparametric test. When the 
outcome of a nonparametric test results in failure to reject the null hypothesis, 
you should investigate the sample size used. It may be that a larger sample will 
produce different results. 


Using an Inappropriate Test \n general, when information about the population 
(such as the condition of normality) is known, it is more efficient to use a 
parametric test. When information about the population is not known, however, 
nonparametric tests can be helpful. 


EXERCISES 


1. Insufficient Evidence Give an example of a nonparametric test in which 
there is not enough evidence to reject the null hypothesis. 


2. Using an Inappropriate Test Discuss the nonparametric tests described in 
this chapter and match each test with its parametric counterpart, which you 
studied in earlier chapters. 


Uses and Abuses 647 


648 CHAPTER 11. Nonparametric Tests 


ci Chapter Summary 


Review 
What Did You Learn? Example(s) Exercises 
Section 11.1 
. . . (x + 0.5) — 0.5n 1,2 1-3, 6 
» How to use the sign test to test a population median z= 
Va 
2 
» How to use the paired-sample sign test to test the difference between two 3 4,5 
population medians (dependent samples) 
Section 11.2 
» How to use the Wilcoxon signed-rank test and the Wilcoxon rank sum test to 1,2 7,8 
determine whether two samples are selected from populations having the 
same distribution 
R—- pp m(n, + ny +1) [meine ety 
z= = 1 BR 2 1 OR = 12 
Section 11.3 
» How to use the Kruskal-Wallis test to determine whether three or more 1 9, 10 
samples were selected from populations having the same distribution 
12 Rt RS Ft 
H= pasa N+1 
N(N + ie Np nk 3 ) 
Section 11.4 
» How to use the Spearman rank correlation coefficient to determine whether 1 11, 12 
the correlation between two variables is significant 
_,__&4? 
n(n? — 1) 
Section 11.5 
» How to use the runs test to determine whether a data set is random 1-3 13, 14 
G- 2n,n 
G = number of runs, z = re LG = aie 
OG ny + No 


2 N2(2ny Nz — Mm — Np) 
Og 2 
(m + m)“(m + ng — 1) 


The table summarizes parametric and nonparametric tests. Always use 
the parametric test when the conditions for that test are satisfied. 


Test application Parametric test Nonparametric test 


One-sample tests z-test for a population mean 


t-test for a population mean 


Sign test for a population median 


Two-sample tests 


Dependent samples t-test for the difference between means | Paired-sample sign test 


Wilcoxon signed-rank test 


z-test for the difference between means Wilcoxon rank sum test 


t-test for the difference between means 


Independent samples 


Tests involving three or more samples 
Correlation 


Randomness 


One-way ANOVA 
Pearson correlation coefficient 


(No parametric test) 


Kruskal-Wallis test 
Spearman rank correlation coefficient 


Runs test 


Review Exercises 649 


at Review Exercises 


Section 11.1 


In Exercises 1-6, use a sign test to test the claim by doing the following. 


(a) Identify the claim and state Hj) and H,. 

(b) Find the critical value. 

(c) Find the test statistic. 

(d) Decide whether to reject or fail to reject the null hypothesis. 


(e) Interpret the decision in the context of the original claim. 


1. A store manager claims that the median number of customers per day is no 
more than 650. The numbers of customers per day for 17 randomly selected 
days are listed below. At a = 0.01, can you reject the manager’s claim? 


675 665 601 642 554 653 639 650 645 
550 677 569 650 660 682 689 590 


2. A company claims that the median credit score for U.S. adults is at least 710. 
The credit scores for 13 randomly selected U.S. adults are listed below. 
At a= 0.05, can you reject the company’s claim? (Adapted from Fair 
Isaac Corporation) 


750 782 805 695 700 706 625 
589 690 772 745 704 710 


3. A government agency claims that the median sentence length for all federal 
prisoners is 2 years. In a random sample of 180 federal prisoners, 65 have 
sentence lengths that are less than 2 years, 109 have sentence lengths that 
are more than 2 years, and 6 have sentence lengths that are 2 years. At 
a = 0.10, can you reject the agency’s claim? (Adapted from U.S. Sentencing 
Commission) 


ad] 4. In a study testing the effects of calcium supplements on blood pressure 
in men, 10 randomly selected men were given a calcium supplement 
for 12 weeks. The table shows the measurements for each subject’s 
diastolic blood pressure taken before and after the 12-week treatment 
period. At a = 0.05, can you reject the claim that there was no 
reduction in diastolic blood pressure? (Adapted from the American 
Medical Association) 
Patient 1 2 3 4 5 
Before treatment 107 110 123) 129 112 


After treatment 100 | 114 | 105 112 | 115 


Patient 6 7 8 9 10 
Before treatment | 111 107 | 112 136 = 102 
After treatment 116 106 | 102 | 125. 104 


650 


CHAPTER 11 


Nonparametric Tests 


BG 5. In a study testing the effects of an herbal supplement on blood pressure 
in men, 11 randomly selected men were given an herbal supplement for 
12 weeks. The table shows the measurements for each subject’s diastolic 
blood pressure taken before and after the 12-week treatment period. 
At a = 0.05, can you reject the claim that there was no reduction in 
diastolic blood pressure? (Adapted from The Journal of the American 
Medical Association) 


Patient 1 2 3 4 5 6 
Before treatment 123 109 112 = 102 98 114 
After treatment 124 | 97 | 113) 105 | 95 119 


Patient 7 8 9 10 11 
Before treatment 119 112 110 | 117. 130 
After treatment 114 | 114 | 121 | 118 | 133 


6. An association claims that the median annual salary of lawyers is $118,160. 
In a random sample of 125 lawyers, 76 were paid less than $118,160, and 49 
were paid more than $118,160. At a = 0.05, can you reject the association’s 
claim? (Adapted from U.S. Bureau of Labor Statistics) 


Section 11.2 


In Exercises 7 and 8, use a Wilcoxon test to test the claim by doing the following. 


(a) Identify the claim and state Hy and H,. 


(b) Decide whether to use a Wilcoxon signed-rank test or a Wilcoxon rank 
sum test. 


(c) Find the critical value(s). 

(d) Find the test statistic. 

(e) Decide whether to reject or fail to reject the null hypothesis. 

(f) Interpret the decision in the context of the original claim. 

eB 7. A career placement advisor claims that there is a difference in the total 
times required to earn a doctorate degree by female and male graduate 
students. The table shows the total times (in years) to earn a doctorate 
for a random sample of 12 female and 12 male graduate students. At 


a = 0.01, can you support the advisor’s claim? (Adapted from Survey of 
Earned Doctorates) 


Female 9 11 9 12 > 11) 8 10) 13 6 6 8 9 
Male 8 | 7 8 | 10) 9 a 7 9 |} 10) 8 9 7 


Review Exercises 651 


8. A medical researcher claims that a new drug affects the number of headache 
hours experienced by headache sufferers. The numbers of headache hours 
(per day) experienced by eight randomly selected patients before and after 
taking the drug are shown in the table. At a = 0.05, can you support the 
researcher’s claim? 


Patient 1 2 3 4 5 6 7 8 


Headache hours (before) 0.9 2.3 2.7 24 2.9 19/12 > 3.1 
Headache hours (after) 14/15 / 14 | 18] 13 | 0.6 | 0.7 | 1.9 


Section 11.3 

In Exercises 9 and 10, use the Kruskal-Wallis test to test the claim by doing 
the following. 

(a) Identify the claim and state Hp and H,. 

(b) Find the critical value and identify the rejection region. 

(c) Find the test statistic H. 

(d) Decide whether to reject or fail to reject the null hypothesis. 


(e) Interpret the decision in the context of the original claim. 


BG 9. The table shows the ages for a random sample of doctorate recipients in 
three fields of study. At a = 0.01, can you conclude that the distribution 
of the ages of the doctorate recipients in at least one field of study is 
different from the others? (Adapted from Survey of Earned Doctorates) 

Field of study Age 
Life sciences 31 | 32 | 34 | 31 | 30 | 32 | 35 | 31 | 32 | 34 | 29 
Physical sciences 30 31 32) 31 | 30 29) 31 30.) 32 | 33 30 


Social sciences 32 | 35 | 31 | 33 | 34 | 31 | 35 | 36 | 32 | 30 | 33 


ad} 10. The table shows the starting salaries for a random sample of college 
graduates in four fields of engineering. At a = 0.05, can you conclude 
that the distribution of the starting salaries in at least one field 
of engineering is different from the others? (Adapted from National 
Association of Colleges and Employers) 


Field of 

engineering Starting salary (in thousands of dollars) 

Chemical 68.4 65.9 | 71.7 | 70.5 | 64.3 69.9 67.5 | 65.7 69.4 | 71.1 
Computer 68.2 67.6 65.8 | 66.4 69.5 | 72.6 | 67.0 70.2 68.5 66.4 
Electrical 66.9 65.5 66.1 | 64.4 67.6 | 67.3 68.9 68.1 67.1 | 67.4 
Mechanical 65.5 64.8 65.6 63.7 | 65.6 65.3 68.1 | 68.6 | 64.9 | 62.7 


652 


CHAPTER 11 


Nonparametric Tests 


Section 11.4 


In Exercises 11 and 12, use the Spearman rank correlation coefficient to test the 
claim by doing the following. 


(a) Identify the claim and state Hy and H,. 

(b) Find the critical value. 

(c) Find the test statistic r,. 

(d) Decide whether to reject or fail to reject the null hypothesis. 


(e) Interpret the decision in the context of the original claim. 


11. The table shows the overall scores and the prices for six randomly selected 
video disk players. The overall score is based mainly on picture quality. At 
a = 0.10, can you conclude that there is a significant correlation between the 
overall score and the price? (Source: Consumer Reports) 


Overall score 93 91 90 87 85 69 


Price (in dollars) = 500 | 300 | 500 | 150 250 130 


12. The table shows the overall scores and the prices per gallon for seven 
randomly selected interior paints. The overall score represents hiding, 
surface smoothness, and resistance to staining, scrubbing, gloss change, 
sticking, mildew, and fading. At a = 0.10, can you conclude that there is 
a significant correlation between the overall score and the price? (Adapted 
from Consumer Reports) 


Overall score 46 73 64 56 94 86 | 50 
Price per gallon (in dollars) 24 40 25 24 40 38 | 26 


Section 11.5 


In Exercises 13 and 14, (a) identify the claim and state Hy and H,, (b) find the 
critical values, (c) find the test statistic, (d) decide whether to reject or fail to reject 
the null hypothesis, and (e) interpret the decision in the context of the original 
claim. Use a = 0.05. 


eB 13. A highway patrol officer stops speeding vehicles on an interstate 
highway. The genders of the last 25 drivers who were stopped are 
shown, where F represents a female driver and M represents a male 
driver. Can you conclude that the stops were not random by gender? 


FMMMFMFMFFFMM 
FFFMMMFMMFFM 


14. The sequence shows the departure status of the last 18 buses to leave a bus 
station, where 7 represents a bus that departed on time and L represents 
a bus that departed late. Can you conclude that the departure status of the 
buses is not random? 


TTTTLLLLT 
LLLTTTTTT 


Chapter Quiz 653 


ci Chapter Quiz 


Take this quiz as you would take a quiz in class. After you are done, check your 
work against the answers given in the back of the book. 


In Exercises 1-5, (a) identify the claim and state Hj and H,, (b) decide which 
nonparametric test to use, (c) find the critical value(s), (d) find the test statistic, 
(e) decide whether to reject or fail to reject the null hypothesis, and (f) interpret 
the decision in the context of the original claim. 


1. An organization claims that the median number of annual volunteer 
hours is 52. In a random sample of 75 people who volunteered last year, 
47 volunteered for less than 52 hours, 23 volunteered for more than 52 hours, 
and 5 volunteered for 52 hours. At a = 0.05, can you reject the organization’s 
claim? (Adapted from U.S. Bureau of Labor Statistics) 


eB 2. A labor organization claims that there is a difference in the hourly 
earnings of union workers and nonunion workers in state and local 
governments. The table shows the hourly earnings (in dollars) for a 
random sample of 10 union workers and 10 nonunion workers in state 
and local governments. At a = 0.10, can you support the organization’s 
claim? (Adapted from U.S. Bureau of Labor Statistics) 


Union Nonunion 


29.75 28.15 32.30 35.52 32.88 | 26.15 23.10 21.20 26.95 22.05 
27.85 27.35 29.05 27.60 26.75 | 24.75 22.50 22.25 21.40 20.45 


eB 3. The table shows the sales prices for a random sample of apartment 
condominiums and cooperatives in four U.S. regions. At a = 0.01, can 
you conclude that the distribution of the sales prices in at least one region 
is different from the others? (Adapted from National Association of Realtors) 


Region Sales price (in thousands of dollars) 


Northeast | 257.3. 250.3. | 242.7 | 275.0 | 270.7 254.8 | 264.2 | 243.4 
Midwest 166.9 183.1 178.9 | 153.9 | 148.5 169.9 163.3 165.1 
South 181.3. 156.7.) 155.6 | 170.4 | 175.3. 196.3 | 178.4 | 166.8 
West 320.2 303.6 357.4 | 331.7 | 291.6 | 327.4 | 321.7 | 308.0 


4. The table shows the numbers of emails sent and the numbers of emails 
received in a week for a random sample of nine people. At a = 0.01, can you 
conclude that there is a significant correlation between the number of emails 
sent and the number of emails received? 


Emails sent 30 30 25 26 24 18 | 18 25. «28 


Emails received 32 36 21 22 20 20) 22: 23+ 23 


Be 5. A meteorologist wants to determine whether days with rain occur randomly 
in April in his hometown. To do so, the meteorologist records whether it 
rains for each day in April. The results are shown, where R represents a 
day with rain and N represents a day with no rain. At a = 0.05, can the 
meteorologist conclude that days with rain are not random? 


NRRNNNNRNRRNRRR 
NRRRRNNNNRNRNNR 


654 


CHAPTER 11 


Nonparametric Tests 


ci Chapter Test 


Take this test as you would take a test in class. 


In 


Exercises 1-5, (a) identify the claim and state Hy and H,, (b) decide which 


nonparametric test to use, (c) find the critical value(s), (d) find the test statistic, 
(e) decide whether to reject or fail to reject the null hypothesis, and (f) interpret the 
decision in the context of the original claim. 


BG 1. The mayor called on council members at a town meeting in the sequence 


shown, where R represents a Republican council member and D represents 
a Democrat council member. At a = 0.05, can you conclude that the 
selection of members was not random? 


RDDDRRDRDDRDDDRRD 
RRRRDRRRDDDRDRDRR 


BG 2. An employment agency representative wants to determine whether there 


is a difference in the annual household incomes in four regions of the 
United States. The representative randomly selects several households 
in each region and records the annual household income for each. The 
table shows the results. At a = 0.01, can the representative conclude that 
the distribution of the annual household incomes in at least one region is 
different from the others? (Adapted from U.S. Census Bureau) 


Region Household income (in thousands of dollars) 
Northeast | 64.2 57.0 65.6 64.7 59.9 62.4 61.5 
Midwest 56.0 61.1 51.9 55.2 57.4 58.5 58.7 
South 49.3 50.5 54.1 46.4 51.3 54.1 51.9 
West 64.0 61.9 58.6 60.7 59.6 61.2 63.1 


lad} 3. Aninvestment company claims that the median age of people with mutual 


funds is 51 years. The ages (in years) of 20 randomly selected mutual 
fund owners are listed below. At a = 0.01, is there enough evidence to 
reject the company’s claim? (Adapted from Investment Company Institute) 


46 34 33 27 58 64 54 36 38 42 
26 51 49 44 46 50 39 34 51 63 


4. An employment agency claims that there is a difference in the weekly earnings 


of workers who are union members and workers who are not union members. 
The table shows the weekly earnings (in dollars) for a random sample of nine 
union members and eight nonunion members. At a = 0.05, can you support 
the agency’s claim? (Adapted from U.S. Bureau of Labor Statistics) 


Member 951 | 1090 | 788 | 896 980 1087 1136 | 1000 | 890 | 919 1026 
Nonmember 850 783 | 954 | 649 | 747 906 895 | 730 | 790 | 687 


5. The table shows the overall scores and the prices for a random sample of 


eight different suitcases. The overall score represents the ease of use, features, 
construction, and durability of a suitcase. At a = 0.05, can you conclude that 
there is a significant correlation between the overall score and the price? 
(Adapted from Consumer Reports) 


Overall score 90 85 81 78 72 68 64 61 
Price (in dollars) 495 | 230 190 160 350 | 230 260 200 


Putting it all together 


REAL DECISIONS 


In a recent year, according to the Bureau of Labor Statistics, the ov LAB O 
median number of years that wage and salary workers had been ~ 

with their current employer (called employee tenure) was 4.2 years. 
Information on employee tenure has been gathered since 1996 
using the Current Population Survey (CPS), a monthly survey of 
about 60,000 households that provides information on employment, 
unemployment, earnings, demographics, and other characteristics ra) PF 
of the U.S. population ages 16 and over. With respect to employee 
tenure, the questions measure how long workers have been with 
their current employers, not how long they plan to stay with 
their employers. 


s 


www.bls.gov 


Employee Tenure 


EXERCISES of 20 Workers 
4.6 2.6 3.3 


1. How Would You Do It? 2.8 1.5 1.9 
(a) What sampling technique would you use to select the sample 4.0 5.0 3.9 
for the CPS? 5.1 3.7 5.4 
(b) Do you think the technique in part (a) will give you a sample 3.6 3.9 6.2 | 
that is representative of the U.S. population? Why or why not? 17 4.6 31 
(c) Identify possible flaws or biases in the survey on the basis of 4.4 3.6 


the technique you chose in part (a). 


2. Is There a Difference? 
A congressional representative claims that the median tenure 


TABLE FOR EXERCISE 2 


for workers from the representative’s district is less than the Employee Employee 
national median tenure of 4.2 years. The claim is based on the fenureitan fenarchion 
representative’s data, which is shown in the table at the right above. a sample of a sample of 
(Assume that the employees were randomly selected.) male workers | female workers 
(a) Is it possible that the claim is true? What questions should you 3.9 4.4 
ask about how the data were collected? 44 49 
(b) How would you test the representative’s claim? Can you use a 47 5.4 
parametric test, or do you need to use a nonparametric test? 43 43 
(c) State the null hypothesis and the alternative hypothesis. 49 40 
(d) Test the claim using a = 0.05. What can you conclude? 3.8 18 
3. Comparing Male and Female Employee Tenures 3.6 5.1 
A congressional representative claims that there is a difference 47 5.1 
between the median tenures for male workers and female workers. 2.3 3.3 
The claim is based on the representative’s data, which is shown in 65 22 
the table at the right. (Assume that the employees were randomly 0.9 52 
selected from the representative’s district.) 51 aA 
(a) How would you test the representative’s claim? Can you use a , i 
parametric test, or do you need to use a nonparametric test? a 
(b) State the null hypothesis and the alternative hypothesis. - 
(c) Test the claim using a = 0.05. What can you conclude? TABLE FOR EXERCISE 3 


Real Statistics—Real Decisions 655 


TECHNOLOGY 


U.S. Income and Economic Research 


The National Bureau of Economic Research (NBER) isa Annual income of people 
private, nonprofit, nonpartisan research organization. The (in dollars) 
NBER provides information for better understanding of 
how the U.S. economy works. Researchers at the NBER Northeast | Midwest | South — West 
concentrate on four types of empirical research: developing 45,481 | 25,781 | 19,946 37,922 
new statistical measurements, estimating quantitative 31,922 28,326 35,140 | 31,198 
models of economic behavior, assessing the effects of public 27.750 26,910 | 33,323 24,129 
policies on the U.S. economy, and projecting the effects of 

alternative policy proposals. nd ne: |. o0.008" | Sates | 


One of the NBER’s interests is the median income of 24,304 32,945 | 18,030 | 34,924 
people in different regions of the United States. The table at 32,216 32,119 | 24,251 | 22,491 
the right shows the annual incomes (in dollars) of a random 30,393 30,990 24,581 | 28,668 
sample of people (15 years and over) in a recent year in four 28,897 44317 32,005 42.207 


USS. regions: Northeast, Midwest, South, and West. 


25,981 18,021 37,091 | 24,465 
20,439 42,193 33,866 20,776 
40,562 25,054 21,746 | 28,521 
48,863 27,703 26,324 | 37,422 | 


In Exercises 1-5, refer to the annual incomes of people in 6. Repeat Exercises 1,3, 4, and 5 using the data in the table 
the table. Use a = 0.05 for all tests. below. The table shows the annual incomes (in dollars) 
of a random sample of families in a recent year in four 


1. Construct a box-and-whisker plot for each region. Do USS. regions: Northeast, Midwest, South, and West. 


the median annual incomes appear to differ between 


regions? 3 a 
Annual income of families 
2. Use technology to perform a sign test to test the claim (in dollars) 
that the median annual income in the Midwest is ; | 
greater than $30,000. Northeast | Midwest | South | West 
3. Use technology to perform a Wilcoxon rank sum test sees Srey 1 STi | Tapes 
to test the claim that the median annual incomes in the 128,686 97,795 | 63,918 80,168 
Northeast and South are the same. 91,252 45,198 | 54,699 | 59,137 
4. Use technology to perform a_ Kruskal-Wallis 127,864 | 64,479 | 99,562 76,928 
test to test the claim that the distributions of annual 79,411 84,647 61,082 61,302 
incomes for all four regions are the same. 62,529 60,658 | 39,088 90,710 
5. Use technology to perform a one-way ANOVA to test 56,461 79,352 | 66,672 | 69,716 
the claim that the average annual incomes for all four 80,559 72,338 | 42,988 | 98,707 
regions are the same. Assume that the populations 59,332 75,972 | 71,434 | 99,676 


of incomes are normally distributed, the samples are 
independent, and the population variances are equal. 
How do your results compare with those in Exercise 4? 


88,559 66,853 58,433. 47,719 
54,603 72,805 85,764 | 76,136 
79,256 69,636 | 56,547 | 54,417 
70,807 82,608 65,464 | 71,171 


87,708 71,869 49,965 | 76,402 
Extended solutions are given in the technology manuals that accompany this text. 
Technical instruction is provided for Minitab, Excel, and the TI-84 Plus. 69,97 6 __ 91,479 61,471 _| (53,273, 


656 CHAPTER 11. Nonparametric Tests 


TRY IT YOURSELF ANSWERS 


Chapter 11 
Section 11.1 


1. There is enough evidence at the 2.5% level of significance to 
support the agency’s claim that the median number of days 
a home is on the market in its city is greater than 120. 

2. There is enough evidence at the 10% level of significance to 
reject the claim that the median age of museum workers in 
the United States is 46 years old. 

3. There is enough evidence at the 5% level of significance 
to support the researcher’s claim that a new vaccine will 
decrease the number of colds in adults. 


Section 11.2 


1. There is not enough evidence at the 1% level of significance 
to support the claim that the spray-on water repellent is 
effective. 

2. There is not enough evidence at the 5% level of significance 
to conclude that there is a difference in the claims paid by 
the companies. 


Section 11.3 


1. There is not enough evidence at the 5% level of significance 
to conclude that the distribution of the salaries in at least 
one state is different from the others. 


Section 11.4 


1. There is enough evidence at the 10% level of significance to 
conclude that there is a significant correlation between the 
oat and wheat prices. 


Section 11.5 


1. 13;3,1,1,1,4,2,1,1,2,3,3,1,3 

2. There is not enough evidence at the 5% level of significance 
to support the claim that the sequence of genders is not 
random. 

3. There is not enough evidence at the 5% level of significance 
to support the claim that the sequence of weather conditions 
is not random. 


TI 


ODD ANSWERS 


Chapter 11 
Section 11.1 (page 612) 


1. 


11. 


13. 


15. 


A nonparametric test is a hypothesis test that does not 
require any specific conditions concerning the shapes 
of population distributions or the values of population 
parameters. 

A nonparametric test is usually easier to perform than its 
corresponding parametric test, but the nonparametric test is 
usually less efficient. 


. When n is less than or equal to 25, the test statistic is equal 


to x (the smaller number of + or — signs). 
When nv is greater than 25, the test statistic is equal to 


_ (x + 0.5) — 0.5n 
a : 


2 


. Verify that the sample is random. Identify the claim and 


state Hp and H,,. Identify the level of significance and sample 
size. Find the critical value using Table 8 (n = 25) or Table 4 
(n > 25) in Appendix B. Calculate the test statistic. Make a 
decision and interpret it in the context of the problem. 


. (a) Hy: median < $300; H,: median > $300 (claim) 


(b) 1 (c) 5 (d) Fail to reject Ap. 

(e) There is not enough evidence at the 1% level of 
significance for the accountant to conclude that the 
median credit card balance of college students is more 
than $300. 


. (a) Ho: median < $253,000 (claim) 


H,: median > $253,000 

(b) 1 (c) 4 = (d) Fail to reject Ap. 

(e) There is not enough evidence at the 5% level of 
significance to reject the agent’s claim that the median 
sales price of new privately owned one-family homes 
sold in a recent month is $253,000 or less. 

(a) Ho: median = $2300 (claim); H,: median < $2300 

(b) —2.05  (c) —147  (d) Fail to reject Hp. 

(e) There is not enough evidence at the 2% level of 
significance to reject the institution’s claim that the 
median amount of credit card debt for families holding 
such debts is at least $2300. 

(a) Ho: median = 30; H,: median > 30 (claim) 

(b) 4 (c) 10 (d) Fail to reject Ap. 

(e) There is not enough evidence at the 1% level of 
significance to support the research group’s claim that 
the median age of the users of a social media website is 
greater than 30 years old. 

(a) Hp: median = 4 (claim); H,: median # 4 

(b) -—196 (c) —2.54 (d) Reject Ap. 

(e) There is enough evidence at the 5% level of significance 
to reject the organization’s claim that the median 
number of rooms in renter-occupied units is four. 


17. (a) Ho: median = $41.93 (claim); H,: median # $41.93 
(b) —2.575  (c) —0.91  (d) Fail to reject Hp. 
(e) There is not enough evidence at the 1% level of 
significance to reject the labor organization’s claim that 
the median hourly wage of computer systems analysts 


is $41.93. 
19. (a) Hj: The lower back pain intensity scores will not 
decrease. 
H,: The lower back pain intensity scores will decrease. 
(claim) 


(b) 1 (c) 0 = (d) Reject Hp. 

(e) There is enough evidence at the 5% level of significance 
to support the physician’s claim that lower back pain 
intensity scores will decrease. 

21. (a) Hp: The SAT scores will not improve. 
H,: The SAT scores will improve. (claim) 

(b) 1 (c) 1 (d) Reject Hp. 

(e) There is enough evidence at the 5% level of significance 
to support the agency’s claim that the SAT scores will 
improve. 

23. (a) Reject Hp. 

(b) There is enough evidence at the 5% level of significance 
to reject the claim that the proportion of adults who 
feel older is equal to the proportion of adults who feel 
younger. 

25. (a) Hy: median < $765 (claim); H,: median > $765 

(b) 2.33. (c) 146 = (d) Fail to reject Hp. 

(e) There is not enough evidence at the 1% level of 
significance to reject the organization’s claim that the 
median weekly earnings of female workers is less than 
or equal to $765. 

27. (a) Ho: median = 27 (claim); H,: median > 27 

(b) 1645 (c) 130 = (d) Fail to reject Ah. 

(e) There is not enough evidence at the 5% level of 
significance to reject the counselor’s claim that the 
median age of brides at the time of their first marriage 
is less than or equal to 27 years old. 


Section 11.2 (page 623) 


1. When the samples are dependent, use a Wilcoxon signed- 
rank test. When the samples are independent, use a Wilcoxon 
rank sum test. 

3. (a) Hp: There is no reduction in diastolic blood pressure. 

(claim) 
H,: There is a reduction in diastolic blood pressure. 
(b) Wilcoxon signed-rank test 
(c) 10 (d) 17 (e) Fail to reject Ap. 
(f) There is not enough evidence at the 1% level of 
significance to reject the claim that there was no 
reduction in diastolic blood pressure. 


Ol 


02 


5. 


ODD ANSWERS 


(a) Ho: There is no difference in the earnings. 

H,: There is a difference in the earnings. (claim) 

(b) Wilcoxon rank sum test 

(c) £1.96 (d) 3.87 (e) Reject Ap. 

(f) There is enough evidence at the 5% level of significance 
to support the administrator’s claim that there is a 
difference in the earnings of people with bachelor’s 
degrees and those with advanced degrees. 


. (a) Ho: There is no difference in salaries. 


H,: There is a difference in salaries. (claim) 

(b) Wilcoxon rank sum test 

(c) £1.96 (d) —2.71  (e) Reject Ap. 

(f) There is enough evidence at the 5% level of significance 
to support the representative’s claim that there is a 
difference in the salaries earned by teachers in Wisconsin 
and Michigan. 


. Reject Hp. There is enough evidence at the 10% level 


of significance for the engineer to conclude that the gas 
mileage is improved. 


Section 11.3 (page 631) 


1. 


. (a) Fail to reject Ap. 


The conditions for using a Kruskal-Wallis test are that the 
samples must be random and independent, and the size of 
each sample must be at least 5. 


. (a) Ho: The distribution of the annual premiums is the same 


in all three states. 
H,: The distribution of the annual premiums in at least 
one state is different from the others. (claim) 
(b) Ve = 5.991; Rejection region: y* > 5.991 
(c) 11.807  (d) Reject Ap. 
(e) There is enough evidence at the 5% level of significance 
to conclude that the distribution of annual premiums in 
at least one state is different from the others. 


. (a) Ho: The distribution of the annual salaries is the same in 


all four states. 
H,: The distribution of the annual salaries in at least one 
state is different from the others. (claim) 

(b) x = 6.251; Rejection region: y* > 6.251 

(c) 3.667  (d) Fail to reject Ho. 

(e) There is not enough evidence at the 10% level of 
significance to conclude that the distribution of annual 
salaries in at least one state is different from the others. 

(b) Fail to reject Ho. 

(c) Both tests come to the same decision, which is that 
there is not enough evidence to support the claim that 
the number of days patients spend in the hospital are 
different in at least one region. 


Section 11.4 (page 636) 


1. 


The Spearman rank correlation coefficient can be used to 
describe the relationship between linear or nonlinear data. 
Also, it can be used for data at the ordinal level and it is 
easier to calculate by hand than the Pearson correlation 
coefficient. 


11. 


13. 


. The ranks of the corresponding data are exactly identical 


when r, is equal to 1. The ranks are in “reverse” order when 
r, is equal to —1. The ranks have no relationship when ,, is 
equal to 0. 


. (a) Ao: ps = 0; H,: p; # 0 (claim) 


(b) 0.738 (c) 0.857 (dd) Reject Hp. 

(e) There is enough evidence at the 5% level of significance 
to conclude that there is a significant correlation 
between purchased seed expenses and fertilizer and 
lime expenses. 


» (a) Ao: ps = 0; H,: p; # 0 (claim) 


(b) 0.700 (c) 0.500 (d) Fail to reject Hp. 

(e) There is not enough evidence at the 5% level of 
significance to conclude that there is a significant 
correlation between the barley and corn prices. 


. Reject Ho. There is enough evidence at the 10% level of 


significance to conclude that there is a significant correlation 
between science achievement scores and GNI. 

Reject Hp. There is enough evidence at the 10% level of 
significance to conclude that there is a significant correlation 
between science and mathematics achievement scores. 
Reject Hp. There is enough evidence at the 10% level of 
significance to conclude that there is a significant correlation 
between average hours worked and the number of on-the-job 
injuries. 


Section 11.5 (page 645) 


1. 


17. 


Sample answer: It is called the runs test because it considers 
the number of runs of data in a sample to determine 
whether the sequence of data was randomly selected. 


. Number of runs: 8 


Run lengths: 1, 1, 1,1,3,3,1,1 


. Number of runs: 9 


Run lengths: 1, 1,1, 1,1, 6, 3,2, 4 
n, = number of T’s = 6 

ny = number of F’s = 6 

n, = number of M’s = 10 

ny = number of F’s = 10 


. too high: 11; too low: 3 
13. 
15. 


too high: 14; too low: 5 

(a) Ho: The coin tosses were random. 

H,: The coin tosses were not random. (claim) 

(b) lower critical value = 
upper critical value = 14 

(c) 9  (d) Fail to reject Ap. 

(e) There is not enough evidence at the 5% level of 
significance to support the claim that the coin tosses 
were not random. 

(a) Ho: The sequence of leagues of winning teams is 

random. 
H,: The sequence of leagues of winning teams is not 
random. (claim) 

(b) £1.96 (c) 1.95 (d) Fail to reject Hp. 

(e) There is not enough evidence at the 5% level of 
significance to conclude that the sequence of leagues of 
World Series winning teams is not random. 


19. (a) Ho: The microchips are random by gender. (claim) 
H,: The microchips are not random by gender. 

(b) lower critical value = 8 
upper critical value = 18 

(c) 12 (d) Fail to reject Ap. 

(e) There is not enough evidence at the 5% level of 
significance to reject the claim that the microchips are 
random by gender. 

21. Fail to reject Ho. There is not enough evidence at the 
5% level of significance to support the claim that the daily 
high temperatures do not occur randomly. 

23. Answers will vary. 


Uses and Abuses for Chapter 11 (page 647) 


1. Answers will vary. 
2. Sign test > z- or f-test 
Paired-sample sign test — (-test 
Wilcoxon signed-rank test — f-test 
Wilcoxon rank sum test — z- or f-test 
Kruskal-Wallis test > one-way ANOVA 
Spearman rank correlation coefficient — Pearson correlation 
coefficient 


Review Exercises for Chapter 11 (page 649) 


1. (a) Ho: median = 650 (claim); H,: median > 650 

(b) 2. (c) 7 (d) Fail to reject Ap. 

(e) There is not enough evidence at the 1% level of 
significance to reject the store manager’s claim that 
the median number of customers per day is no more 
than 650. 

3. (a) Ho: median = 2 (claim); H,: median # 2 

(b) —1645 (c) —3.26 (d) Reject A. 

(e) There is enough evidence at the 10% level of significance 
to reject the agency’s claim that the median sentence 
length for all federal prisoners is 2 years. 

5. (a) Ho: There was no reduction in diastolic blood pressure. 
(claim) 

H,: There was a reduction in diastolic blood pressure. 

(b) 2. (c) 3. (d) Fail to reject Ap. 

(ec) There is not enough evidence at the 5% level of 
significance to reject the claim that there was no 
reduction in diastolic blood pressure. 

7. (a) Ho: There is no difference in the total times required 
to earn a doctorate degree by female and male 
graduate students. 

H,: There is a difference in the total times required 
to earn a doctorate degree by female and male 
graduate students. (claim) 

(b) Wilcoxon rank sum test (c) 2.575 

(d) —1.357 (or 1.357) — (e) Fail to reject Ap. 

(f) There is not enough evidence at the 1% level of 
significance to support the claim that there is a difference 
in the total times required to earn a doctorate degree by 
female and male graduate students. 


ODD ANSWERS O3 


9. (a) Hy: The distribution of the ages of the doctorate 


recipients is the same in all three fields of study. 

H,: The distribution of the ages of the doctorate 
recipients in at least one field of study is different 
from the others. (claim) 

(b) x, = 9.210; Rejection region: y* > 9.210 

(c) 6.741  (d) Fail to reject Ho. 

(e) There is not enough evidence at the 1% level of 
significance to conclude that the distribution of the ages 
of the doctorate recipients in at least one field of study 
is different from the others. 


11. (a) Ab: p; = 0; H;: ps ~ 0 (claim) 


(b) 0.829 (c) 0.843 (d) Reject Hp. 

(e) There is enough evidence at the 10% level of significance 
to conclude that there is a significant correlation 
between the overall score and the price. 


13. (a) Ho: The traffic stops were random by gender. 


H,: The traffic stops were not random by gender. (claim) 
(b) lower critical value = 8 
upper critical value = 19 
(c) 14 (d) Fail to reject Ap. 
(e) There is not enough evidence at the 5% level of 
significance to support the claim that the stops were not 
random by gender. 


Quiz for Chapter 11 (page 653) 
1. (a) Ho: median = 52 (claim); H,: median # 52 


(b) Signtest (c) £1.96 (d) —2.75  (e) Reject Ap. 

(f) There is enough evidence at the 5% level of significance 
to reject the organization’s claim that the median 
number of annual volunteer hours is 52. 


2. (a) Ho: There is no difference in the hourly earnings. 


H,: There is a difference in the hourly earnings. (claim) 

(b) Wilcoxon rank sum test 

(c) £1.645 (d) —3.70 (or 3.70)  (e) Reject Ap. 

(f) There is enough evidence at the 10% level of significance 
to support the organization’s claim that there is a 
difference in the hourly earnings of union workers and 
nonunion workers in state and local governments. 


3. (a) Ho: The distribution of the sales prices is the same in all 


four regions. 
H,: The distribution of the sales prices in at least one 
region is different from the others. (claim) 
(b) Kruskal-Wallis test 
(c) 11345 (d) 26.412 (e) Reject Ap. 
(f) There is enough evidence at the 1% level of significance 
to conclude that the distributions of the sales prices in at 
least one region is different from the others. 


4. (a) Ho: ps = 0; H,: ps ~ 0 (claim) 


(b) Spearman rank correlation coefficient 

(c) 0.833 (d) 0.829 (e) Fail to reject Ap. 

(f) There is not enough evidence at the 1% level of 
significance to conclude that there is a significant 
correlation between the number of emails sent and the 
number of emails received. 


O4 ODD ANSWERS 


5. (a) Ho: The days with rain are random. 

H,: The days with rain are not random. (claim) 

(b) Runs test 

(c) lower critical value = 10 
upper critical value = 22 

(d) 16 (e) Fail to reject Hp. 

(f) There is not enough evidence at the 5% level of 
significance for the meteorologist to conclude that days 
with rain are not random. 


Real Statistics—Real Decisions for Chapter 11 (page 655) 


1. (a)-(c) Answers will vary. 

2. (a) Answers will vary. 

(b) Sign test; You need to use the nonparametric test 
because nothing is known about the shape of the 
population distribution. 

(c) Ho: median = 4.2; H,: median < 4.2 (claim) 

(d) Fail to reject Hp. There is not enough evidence at the 
5% level of significance to support the claim that the 
median tenure for workers from the representative’s 
district is less than 4.2 years. 

3. (a) Wilcoxon rank sum test; You need to use the 
nonparametric test because nothing is known about the 
shape of the population. 

(b) Ho: There is no difference between the median tenures 

for male workers and female workers. 
H,: There is a difference between the median tenures for 
male workers and female workers. (claim) 

(c) Fail to reject Hp. There is not enough evidence at the 
5% level of significance to support the claim that there 
is a difference between the median tenures for male 
workers and female workers. 


APPENDIX A 


In this appendix, we use a 0-to-z table as an alternative development of the standard 
normal distribution. It is intended that this appendix be used after completion of the 
“Properties of a Normal Distribution” subsection of Section 5.1 in the text. If used, 
this appendix should replace the material in the “Standard Normal Distribution” 
subsection of Section 5.1 except for the exercises. 


Standard Normal Distribution (0-to-z) 


0.0 .0000 .0040 .0080 .0120 .0160 .0199 .0239 .0279 .0319 .0359 
0.1 .0398 .0438 .0478 .0517 .0557 .0596 .0636 .0675 .0714 .0753 
0.2 .0793 .0832 .0871 .0910 .0948 .0987 .1026 -1064 1103 1141 

0.3 1179 m2, nIZ5O 1293 1331 .1368 .1406 .1443 .1480 .1517 
0.4 .1554 = .1591 .1628 .1664 .1700 .1736 .1772 .1808 .1844 .1879 
0.5 sles .1950 .1985 .2019 .2054 = .2088 .2123 .2157 .2190 .2224 
0.6 .2257 .2291 .2324 = .2357 .2389 .2422 2454 = .2486 = .2517 .2549 
0.7 .2580 .2611 .2642 .2673 .2704 = .2734 ~—.2764 .2794 = .2823 .2852 
0.8 .2881 .2910 .2939 .2967 .2995 .3023 .3051 .3078 .3106 3133 
0.9 <Sill3s) .3186 .3212 3238 3264  .3289 eexehills) .3340 .3365 3389 
1.0 3413 .3438 .3461 3485 .3508 .3531 3554 = .3577 .3599 3621 
1.1 .3643 .3665 .3686 3708 Zs) .3749 .3770 .3790 .3810 3830 
1.2 3849 .3869 .3888 .3907 3925 3944 = .3962 .3980 3997 .4015 
1.3 .4032 .4049 .4066 .4082 .4099 4115 4131 4147 .4162 4177 
1.4 4192 .4207 4222 4236 = .4251 4265 4279 4292 .4306 .4319 
1.5 .4332 .4345 .4357 .4370 .4382 .4394  .4406 .4418 4429 4441 
1.6 4452 .4463 4474 4484 4495 .4505 .4515 .4525 .4535 4545 
1.7 4554 .4564 .4573 .4582 .4591 .4599 .4608 .4616 .4625 .4633 
1.8 4641 .4649 .4656 4664 ~—.4671 .4678 .4686 .4693 .4699 .4706 
1.9 .4713 .4719 4726 —-.4732 .4738 4744 .4750 .4756 .4761 .4767 
2.0 4772 .4778 .4783 .4788 .4793 .4798 .4803 .4808 .4812 .4817 
2.1 .4821 .4826 .4830 .4834 .4838 4842 .4846 .4850 .4854 .4857 
2.2 .4861 .4864 .4868 .4871 .4875 .4878 .4881 .4884 .4887 .4890 
2.3 .4893 .4896 .4898 .4901 .4904 .4906  .4909 .4911 .4913 .4916 
2.4 .4918 .4920 4922 .4925 4927 .4929 .4931 .4932 4934  .4936 
2.5 .4938 .4940 4941 .4943 .4945 .4946 .4948 .4949 .4951 .4952 
2.6 .4953 .4955 .4956 .4957 .4959 .4960 .4961 .4962 .4963 .4964 
2.7 .4965 .4966 .4967 .4968 4969 .4970 .4971 .4972 .4973 .4974 
2.8 .4974 .4975 .4976 4977 .4977 .4978 .4979 .4979 .4980 .4981 
2.9 .4981 .4982 .4982 .4983 .4984 .4984 .4985 .4985 .4986 .4986 
3.0 .4987 .4987 .4987 .4988 .4988 .4989 .4989 .4989 .4990 .4990 
3.1 .4990 .4991 .4991 .4991 4992 .4992 .4992 .4992 .4993 .4993 
3.2 .4993 .4993 4994 .4994 .4994 .4994 .4994 .4995 .4995 .4995 
3.3 .4995 .4995 .4995 .4996  .4996 4996  .4996 .4996  .4996 .4997 
3.4 .4997 .4997 .4997 .4997 .4997 .4997 .4997 .4997 .4997 .4998 


Reprinted with permission of Frederick Mosteller 


A2 APPENDIX A_ Alternative Presentation of the Standard Normal Distribution 


A Alternative Presentation of the Standard Normal Distribution 


What You Should Learn 


» How to find areas under the 
standard normal curve 


* Study Tip 
Because every normal 
distribution can be 
transformed to the standard 
normal distribution, you can 
use z-scores and the 
standard normal curve to 

find areas (and therefore probabilities) 

under any normal curve. 


Study Tip 


It is important that you 
know the difference 
between x and z. The 
random variable x is 
sometimes called a raw 
= “score and represents 
values in a nonstandard normal 
distribution, whereas z represents 
values in the standard normal 
distribution. 


The Standard Normal Distribution 


There are infinitely many normal distributions, each with its own mean and 
standard deviation. The normal distribution with a mean of 0 and a standard 
deviation of 1 is called the standard normal distribution. The horizontal scale 
of the graph of the standard normal distribution corresponds to z-scores. In 
Section 2.5, you learned that a z-score is a measure of position that indicates the 
number of standard deviations a value lies from the mean. Recall that you can 
transform an x-value to a z-score using the formula 


Value - Mean xX-yp 


yo atcha = Round to the nearest hundredth. 
Standard deviation o 


DEFINITION 


The standard normal distribution is a normal distribution with a mean of 0 
and a standard deviation of 1. The total area under its normal curve is 1. 


Standard Normal Distribution 


When each data value of a normally distributed random variable x is 
transformed into a z-score, the result will be the standard normal distribution. 
After this transformation takes place, the area that falls in the interval under the 
nonstandard normal curve is the same as that under the standard normal curve 
within the corresponding z-boundaries. 

In Section 2.4, you learned to use the Empirical Rule to approximate areas 
under a normal curve when the values of the random variable x corresponded 
to —3, —2, -1, 0, 1, 2, or 3 standard deviations from the mean. Now, you will 
learn to calculate areas corresponding to other x-values. After you use the 
formula above to transform an x-value to a z-score, you can use the Standard 
Normal Table (0-to-z) on page Al. The table lists the area under the standard 
normal curve between 0 and the given z-score. As you examine the table, notice 
the following. 


Properties of the Standard Normal Distribution 


. The distribution is symmetric about the mean (z = 0). 
. The area under the standard normal curve to the left of z = 0 is 0.5 and 


the area to the right of z = 0 is 0.5. 


. The area under the standard normal curve increases as the distance 
between 0 and z increases. 


APPENDIX A_ Alternative Presentation of the Standard Normal Distribution A3 


At first glance, the table on page Al appears to give areas for positive 
z-scores only. However, because of the symmetry of the standard normal curve, 
the table also gives areas for negative z-scores (see Example 1). 


Using the Standard Normal Table (0-to-z) 


1. Find the area under the standard normal curve between z = 0 and 
z=1.15. 


2. Find the z-scores that correspond to an area of 0.0948. 


SOLUTION 

1. Find the area that corresponds to z = 1.15 by finding 1.1 in the left column 
and then moving across the row to the column under 0.05. The number in 
that row and column is 0.3749. So, the area between z = 0 and z = 1.15 


\ Area = 0.3749 


0 1.45 is 0.3749, as shown in the figure at the left. 
4 .00 .01 02 .03 .04 .06 

0.0 0000 .0040 .0080 .0120 «0160 .0239 

0.1 0398 .0438 .0478 .0517 .0557 .0636 

0.2 .0793 = .0832 ~=—-.0871 .0910 .0948 .1026 

0.3 all ZS) 1217 mI 205) .1293 Son .1406 

.3315 

.3554 

.3770 

1.2 3849 .3869 .3888 .3907 .3925 .3944 .3962 

1.3 4032 .4049 .4066 .4082 .4099  .4115 .4131 

Area = 0.0948 1.4 4192 4207 4222 4236 4251 ~—«.4265~— 4279 


2. Find the z-scores that correspond to an area of 0.0948 by locating 0.0948 
in the table. The values at the beginning of the corresponding row and 
at the top of the corresponding column give the z-score. For an area of 
0.0948, the row value is 0.2 and the column value is 0.04. So, the z-scores are 


nN 


0.24 0 z = —0.24 and z = 0.24, as shown in the figures at the left. 
Area = 0.0948 z .00 .01 .02 .03 .05 .06 

0.0 0000 .0040 .0080  .0120 .0199 .0239 

0.1 0398 .0438 .0478 .0517 0596 .0636 

.0987 = =.1026 

0.3 1179 all nl SOOM oon .1368  .1406 

0.4 .1554 = .1591 1628 .1664 .1700 .1736 = .1772 

a . 0.5 IS .1950 .1985 .2019 .2054 §.2088 = .2123 


0 0.24 


TRY IT YOURSELF 1 
1. Find the area under the standard normal curve between z = Oand z = 2.19. 


TI-84 PLUS 2. Find the z-scores that correspond to an area of 0.4850. 
Perna ecrkalsts 


or d9258189 


Answer: Page A39 


When the z-score is not in the table, use the entry closest to it. When the 
z-score is exactly midway between two z-scores, use the area midway between 
the corresponding areas. In addition to using the table, you can use technology 
to find the area under the standard normal curve that corresponds to a z-score, 
as shown at the left using a TI-84 Plus for Part 1 of Example 1. 


A4 APPENDIX A _ Alternative Presentation of the Standard Normal Distribution 


You can use the following guidelines to find various types of areas under the 
standard normal curve. 


GUIDELINES 


Finding Areas Under the Standard Normal Curve 

1. Sketch the standard normal curve and shade the appropriate area under 
the curve. 

2. Use the Standard Normal Table (0-to-z) on page A1 to find the area that 
corresponds to the z-score(s). 

3. Find the area by following the directions for each case shown. 


a. Area to the left of z 
i. When z < 0, subtract the area from 0.5. ii. When z > 0, add 0.5 to the area. 


1. The area between z = 0 


1. The area between z = 0 
and z = 1.23 is 0.3907. 


and z = —1.23 is 0.3907 2. Add to find the area 
to the left of z = 1.23; 
0.5 + 0.3907 = 0.8907. 


2. Subtract to find the area 
to the left of z = —1.23; 
0.5 — 0.3907 = 0.1093. 


b. Area to the right of z 
i. When z < 0, add 0.5 to the area. iil. When z > 0, subtract the area from 0.5. 


1. The area between z =0 1. The area between z = 0 
and z = —1.23 is 0.3907. 2. Add to find the area and z = 1.23 is 0.3907 2. Subtract to find the area 
to the right of z = ~1.23; to the right of z = 1.23; 
0.5 + 0.3907 = 0.8907. 0.5 — 0.3907 = 0.1093. 


c. Area between two z-scores 
i. When the two z-scores have the same sign ii. When the two z-scores have opposite signs 
(both positive or both negative), subtract (one negative and one positive), add the 
the smaller area from the larger area. areas. 


1. The area between z = 0 


1. The area between z = 0 
and z, = 1.23 is 0.3907. 


and z, = 1.23 is 0.3907 2. The area between z = 0 2. The area between z = 0 
and z5 = 2.5 is 0.4938. and z, =—0.5 is 0.1915. 


3. aan to find the area : 3. Add to find the area between 
etween Z, = 1.23 and z, = 2.5; 2, = 1.23 and’ zs =-0:5: 
= = la Dees 
0.4938 — 0.3907 = 0.1031. 0.3907 + 0.1915 = 0.5822. 


TI-84 PLUS 
2o-hnoarmalodf Cs . 
1616876617 


TI-84 PLUS 
2o-hnormaledt (Hs 1 
: 1445722279 


APPENDIX A. Alternative Presentation of the Standard Normal Distribution A5 


Finding Area Under the Standard Normal Curve 


Find the area under the standard normal curve to the left of z = —0.99. 

SOLUTION 

The area under the standard normal curve to the left of z = —0.99 is shown. 
Area = 0.3389 


Area = 0.5 — 0.3389 


-0.99 0 


From the Standard Normal Table (0-to-z), the area corresponding to z = —0.99 
is 0.3389. Because the area to the left of z = 0 is 0.5, the area to the left of 
z = —0.99 is 


Area = 0.5 — 0.3389 = 0.1611. 


You can use technology to find the area to the left of z = —0.99, as shown at 
the left. 
TRY IT YOURSELF 2 


Find the area under the standard normal curve to the left of z = 2.13. 
Answer: Page A39 


Finding Area Under the Standard Normal Curve 
Find the area under the standard normal curve to the right of z = 1.06. 


SOLUTION 
The area under the standard normal curve to the right of z = 1.06 is shown. 


Area = 0.3554 


Area = 0.5 — 0.3554 


T T 
0 1.06 


From the Standard Normal Table (0-to-z), the area corresponding to z = 1.06 


is 0.3554. Because the area to the right of z = 0 is 0.5, the area to the right of 
Zz = 1.06 is 


Area = 0.5 — 0.3554 = 0.1446. 


You can use technology to find the area to the right of z = 1.06, as shown at 
the left. 
TRY IT YOURSELF 3 


Find the area under the standard normal curve to the right of z = —2.16. 
Answer: Page A39 


A6é APPENDIX A _ Alternative Presentation of the Standard Normal Distribution 


TI-84 PLUS 


Finding Area Under the Standard Normal Curve 


Find the area under the standard normal curve between z = —1.5 and 
z= 1.25. 

SOLUTION 

The area under the standard normal curve between z = —1.5 and z = 1.25 is 
shown. 


Area = 0.4332 + 0.3944 


Area = 0.4332 Area = 0.3944 


=1;5 0 1.25 


From the Standard Normal Table (0-to-z), the area corresponding to z = —1.5 
is 0.4332 and the area corresponding to z = 1.25 is 0.3944. To find the area 
between these two z-scores, add the resulting areas. 


Area = 0.4332 + 0.3944 = 0.8276 


Note that when you use technology, your answers may differ slightly from 
those found using the Standard Normal Table. For instance, when finding the 


area between z = —1.5 and z = 1.25 ona TI-84 Plus, you get the result shown 
at the left. 

Interpretation So, 82.76% of the area under the curve falls between z = —1.5 
and z = 1.25. 

TRY IT YOURSELF 4 

Find the area under the standard normal curve between z = —2.165 and 
z= —1.35. 


Answer: Page A39 


Because the normal distribution is a continuous probability distribution, the 
area under the standard normal curve to the left of a z-score gives the probability 
that z is less than that z-score. For instance, in Example 2, the area to the left 
of z = —0.99 is 0.1611. So, P(z < —0.99) = 0.1611, which is read as “the 
probability that z is less than —0.99 is 0.1611.” The table shows the probabilities 
for Examples 3 and 4. 


Area Probability 
Example 3 To the right of z = 1.06: 0.1446 P(z > 1.06) = 0.1446 
Example 4 Between z = —1.5 and z = 1.25:0.8276 P(—-1.5 < z < 1.25) = 0.8276 


Recall from Section 2.4 that values lying more than two standard deviations 
from the mean are considered unusual. Values lying more than three standard 
deviations from the mean are considered very unusual. So, a z-score greater 
than 2 or less than —2 is unusual. A z-score greater than 3 or less than —3 is 
very unusual. 

You are now ready to continue Section 5.1 on page 264 with the section 
exercises. 


APPENDIX B 


Table 1—Random Numbers 


92630 
79445 
59654 
31524 
06348 
28703 
68108 
99938 
91543 
42103 
17138 
28297 
09331 
31295 
36146 
29553 
23501 
57888 
55336 
10087 
34101 
53362 
82975 
54827 
25464 
67609 
44921 
33170 
84687 
71886 
00475 
25993 
92882 
25138 
84631 
34003 
53775 
59316 
20479 
86180 
21451 
98062 
01788 
62465 
94324 
05797 
10395 
35177 
25633 
16464 


78240 
78735 
71966 
49587 
76938 
51709 
89266 
90704 
73196 
02781 
27584 
14280 
56712 
04204 
15560 
18432 
22642 
85846 
71264 
10072 
81277 
44940 
66158 
84673 
59098 
60214 
70924 
30972 
85445 
56450 
02224 
38881 
53178 
26810 
71882 
92326 
45749 
97885 
66557 
84931 
68001 
68375 
64429 
04841 
31089 
43984 
14289 
56986 
89619 
48280 


19267 
71549 
27386 
76612 
90379 
94456 
94730 
93621 
34449 
73920 
25296 
54524 
51333 
93712 
27592 
13630 
63081 
67967 
88472 
55980 
66090 
60430 
84731 
22898 
27436 
41475 
61295 
98130 
06208 
36567 
74722 
68361 
99195 
07093 
12991 
12793 
05734 
72807 
50705 
25455 
72710 
80089 
14430 
43272 
84159 
21575 
52185 
25549 
75882 
94254 


95457 
44843 
50004 
39789 
51392 
48396 
95761 
66330 
63513 
56297 
28387 
21618 
06289 
51287 
42089 
05529 
08191 
07835 
04334 
64688 
88872 
22834 
19436 
08094 
89421 
84950 
51137 
95828 
17654 
09395 
14721 
59560 
93803 
15677 
83028 
61453 
86169 
54966 
26999 
26044 
40261 
24135 
94575 
68702 
92933 
09908 
09721 
59730 
98256 
45777 


53497 
26104 
05358 
13537 
55887 
73780 
75023 
33393 
83834 
72678 
51350 
95320 
75345 
05754 
99281 
02791 
89420 
11314 
63919 
68239 
37818 
14130 
55790 
14326 
80754 
40133 
47596 
49786 
51333 
96951 
40215 
41274 
56985 
60688 
82484 
48121 
42762 
60859 
09854 
02227 
61281 
72355 
75153 
01274 
99989 
70221 
25789 
64718 
02126 
45150 


23894 
67318 
94031 
48086 
71015 
06436 
48464 
95261 
99411 
12249 
61664 
38174 
08811 
79396 
59640 
81017 
67800 
01545 
36394 
20461 
72142 
96593 
69229 
87038 
89924 
02546 
86735 
13301 
02878 
35507 
21351 
69742 
53089 
04410 
90339 
74271 
70175 
11932 
52591 
52015 
13172 
95428 
94576 
05437 
89500 
19791 
38562 
52630 
72099 
68865 


37708 
00701 
29281 
59483 
09209 
86641 
65544 
95349 
58826 
25270 
37893 
60579 
82711 
87399 
15221 
49027 
55137 
48535 
11196 
89381 
67140 
23298 
28661 
42892 
19097 
09570 
35561 
36081 
35010 
17555 
08596 
40703 
15305 
24505 
91950 
28363 
97310 
35265 
14063 
21820 
63819 
11808 
61393 
22953 
91586 
51578 
54794 
31100 
57183 
11382 


79862 
34986 
18544 
60680 
79157 
69239 
96583 
51769 
40456 
36678 
05363 
08089 
57392 
51773 
96079 
79031 
54707 
17142 
92470 
93809 
50785 
56203 
13675 
21127 
67737 
45682 
76649 
80761 
67578 
35212 
45625 
37993 
50522 
37890 
74579 
66561 
73894 
71601 
30214 
50599 
48970 
29740 
96192 
18946 
02802 
36432 
04897 
62384 
55887 
11782 


76471 
66751 
52429 
84675 
24440 
57662 
18911 
91616 
69268 
21313 
44143 
94999 
25252 
33075 
09961 
50912 
32945 
08552 
70543 
00796 
21380 
92671 
99318 
30712 
80368 
50165 
18217 
33985 
61574 
69106 
83981 
03435 
55900 
67186 
03539 
75220 
88606 
55577 
19890 
51671 
51732 
81644 
03227 
99053 
69471 
33494 
59012 
49483 
09320 
22695 


A Million Random Digits with 100,000 Normal Deviates by the Rand Corporation 
(New York: The Free Press, 1955). 


66418 
99723 
06080 
53014 
30244 
80181 

16391 
33238 
48562 
75767 
42677 
78460 
30333 
97061 
05371 
09399 
64522 
67457 
29776 
95945 
16703 
15925 
76873 
48489 
08795 
15609 
63446 
68621 
20749 
01679 
63748 
18873 
43026 
62829 
90122 
35908 
19994 
67715 
19292 
65411 

54113 
86610 
32258 
41690 
68274 
79888 
89251 
11409 
73463 
41988 


A7 


A8& APPENDIX B Table 2—Binomial Distribution 


Table 2—Binomial Distribution 


This table shows the probability of x successes in 
probability of success p. 


.01 
.980 
.020 
.000 


.970 
.029 
.000 
.000 


.961 
.039 
.001 
.000 
.000 


.951 
.048 
001 
.000 
.000 
.000 


941 
.057 
.001 
.000 
.000 
.000 
.000 


.932 
.066 
.002 
.000 
.000 
.000 
.000 
-000 


923 
.075 
.003 
.000 
.000 
.000 
.000 
.000 
.000 


914 
.083 
.003 
-000 
.000 
.000 
.000 
.000 
.000 
-000 


OMANDMNABWNH HO ONDATRWNHHONDOBWNHHOMDANRWNHH-O OTOBWNHHO BRWNHHO WN HONM—OlX 


.05 


902 
.095 
.002 


.857 
135 
.007 
.000 


815 
171 
.014 
.000 
.000 
774 
.204 
021 
.001 
.000 
.000 


735 
.232 
.031 
.002 
.000 
.000 
.000 


.698 
.257 
041 
.004 
.000 
.000 
.000 
.000 


.663 
.279 
.051 
.005 
.000 
.000 
.000 
.000 
.000 


.630 
.299 
.063 
.008 
.001 
.000 
.000 
.000 
-000 
.000 


-10 
.810 
.180 
.010 


729 
243 
.027 
.001 


.656 
.292 
.049 
.004 
.000 


.590 
328 
.073 
.008 
.000 
.000 


531 
.354 
.098 
.015 
.001 
.000 
.000 


.478 
372 
124 
.023 
.003 
.000 
.000 
.000 


.430 
.383 
149 
.033 
.005 
.000 
.000 
.000 
.000 


387 
387 
172 
.045 
.007 
.001 
.000 
-000 
.000 
.000 


-15 


723 
.255 
.023 
.614 
325 
.057 
.003 
522 
.368 
.098 
.011 

.001 
.444 
392 
138 
.024 
.002 
.000 
377 
399 
176 
042 
.006 
.000 
.000 
321 
396 
.210 
.062 
.011 

.001 

.000 
.000 
272 
.385 
.238 
.084 
.018 
.003 
.000 
.000 
.000 
.232 
.368 
.260 
107 
.028 
.005 
001 

.000 
.000 
.000 


.20 


.640 
.320 
.040 


512 
384 
.096 
.008 


410 
410 
.154 
.026 
.002 
328 
410 
.205 
.051 
.006 
.000 


.262 
393 
.246 
.082 
.015 
.002 
.000 


.210 
.367 
.275 
115 
.029 
.004 
.000 
.000 


.168 
.336 
.294 
147 
.046 
.009 
.001 

.000 
.000 


134 
302 
302 
176 
.066 
.017 
.003 
.000 
.000 
.000 


.25 


.563 
.375 
.063 
422 
.422 
141 

.016 
316 
422 
211 

.047 
.004 


.237 
396 
.264 
.088 
.015 
.001 
178 
.356 
.297 
132 
.033 
.004 
.000 


133 
311 
311 
173 
.058 
012 
.001 
.000 


-100 
.267 
311 

.208 
.087 
.023 
.004 
.000 
.000 
.075 
.225 
.300 
.234 
117 

.039 
.009 
.001 

.000 
-000 


30 


490 
420 
.090 
343 
441 
.189 
.027 
.240 
412 
.265 
.076 
.008 
.168 
.360 
.309 
132 
.028 
.002 
118 

303 
324 
.185 
.060 
.010 
.001 

.082 
.247 
318 
.227 
.097 
.025 
.004 
-000 


058 
198 
.296 
1254 
136 
.047 
010 
001 

.000 
.040 
.156 
.267 
267 
172 
074 
021 

.004 
.000 
.000 


.35 


423 
.455 
123 


.275 
444 
.239 
.043 
.179 
384 
311 

112 

.015 


116 
312 
.336 
181 
.049 
.005 
.075 
.244 
328 
.236 
.095 
.020 
.002 
.049 
.185 
.299 
.268 
144 
.047 
.008 
.001 


.032 
137 
.259 
.279 
.188 
.081 
.022 
.003 
.000 


021 
-100 
.216 
.272 
.219 
118 
042 
.010 
001 
.000 


n independent trials, each with 


40 
.360 
.480 
-160 


.216 
432 
.288 
.064 


.130 
346 
346 
.154 
.026 
.078 
.259 
346 
.230 
.077 
.010 
.047 
.187 
311 

.276 
138 
.037 
.004 


.028 
.131 
.261 
.290 
194 
.077 
.017 
.002 


.017 
.090 
.209 
.279 
.232 
124 
.041 

.008 
.001 


.010 
.060 
.161 
.251 
.251 
167 
.074 
021 
.004 
.000 


From Brase/Brase, Understandable Statistics, Sixth Edition. 


45 


303 
495 
.203 


-166 
.408 
334 
.091 


.092 
.300 
.368 
.200 
041 


.050 
.206 
.337 
.276 
113 
019 


.028 
.136 
.278 
303 
186 
.061 
.008 
.015 
.087 
.214 
.292 
.239 
117 
.032 
.004 


.008 
.055 
-157 
.257 
.263 
172 
.070 
.016 
.002 


.005 
.034 
111 

.212 
.260 
.213 
116 
041 
.008 
.001 


Pp 


50 


.250 
-500 
.250 


125 
375 
.375 
125 
.062 
.250 
.375 
.250 
.062 


.031 
156 
312 
.312 
.156 
.031 


.016 
.094 
.234 
312 
.234 
.094 
.016 
.008 
.055 
164 
.273 
.273 
.164 
.055 
.008 


.004 
031 
.109 
.219 
.273 
.219 
-109 
.031 
.004 


.002 
.018 
.070 
164 
.246 
.246 
164 
.070 
.018 
.002 


-55 


.203 
-495 
.303 


091 
334 
.408 
.166 


041 
.200 
.368 
.300 
.092 


.019 
113 
.276 
337 
.206 
.050 


.008 
.061 
.186 
.303 
.278 
.136 
.028 


.004 
.032 
117 
.239 
.292 
.214 
.087 
.015 


.002 
.016 
.070 
172 
.263 
.257 
.157 
.055 
.008 


.001 
.008 
041 
116 
.213 
.260 
.212 
111 

.034 
-005 


.60 


-160 
.480 
.360 


.064 
.288 
432 
.216 


.026 
.154 
.346 
346 
130 
.010 
.077 
.230 
346 
.259 
.078 


.004 
.037 
138 
.276 
311 

187 
.047 


.002 
017 
077 
.194 
.290 
.261 
131 
.028 


.001 
.008 
041 
124 
.232 
.279 
.209 
.090 
017 


.000 
.004 
021 
074 
.167 
.251 
.251 
161 
-060 
-010 


.65 


123 
.455 
423 


.043 
.239 
444 
.275 
.015 
112 

311 

384 
179 


.005 
.049 
.181 
336 
312 
.116 


.002 
.020 
.095 
.236 
328 
.244 
.075 
.001 
.008 
.047 
144 
.268 
.299 
.185 
.049 


.000 
.003 
.022 
.081 
.188 
.279 
.259 
137 
.032 
.000 
001 
.010 
042 
118 
.219 
.272 
.216 
100 
021 


.70 


.090 
420 
490 


.027 
189 
441 
343 


.008 
.076 
.265 
412 
.240 


.002 
.028 
132 
309 
.360 
.168 


.001 
.010 
.060 
185 
324 
.303 
118 


.000 
.004 
.025 
.097 
i227 
318 
.247 
.082 


.000 
.001 
.010 
.047 
136 
.254 
.296 
.198 
.058 


.000 
.000 
.004 
021 
.074 
172 
.267 
.267 
.156 
.040 


15 


.063 
.375 
563 
.016 
141 
422 
422 


.004 
.047 
211 

422 
316 


.001 
.015 
.088 
.264 
396 
.237 


.000 
.004 
.033 
132 
.297 
.356 
178 


.000 
.001 
012 
.058 
.173 
311 
311 
133 


.000 
.000 
.004 
.023 
.087 
.208 
311 

.267 
100 


.000 
.000 
.001 
.009 
.039 
117 
.234 
.300 
.225 
.075 


.80 


.040 
320 
.640 


.008 
.096 
.384 
.512 


.002 
.026 
.154 
.410 
.410 


.000 
.006 
.051 
.205 
.410 
328 


.000 
.002 
.015 
.082 
.246 
393 
.262 


.000 
.000 
.004 
.029 
115 
.275 
.367 
.210 


.000 
.000 
.001 
.009 
.046 
.147 
.294 
.336 
.168 


.000 
.000 
.000 
.003 
017 
.066 
.176 
302 
302 
134 


85 


.023 
.255 
723 


.003 
.057 
325 
.614 
001 
011 

.098 
.368 
522 


.000 
.002 
.024 
.138 
392 
444 


.000 
.000 
.006 
042 
.176 
399 
377 


.000 
.000 
.001 
.011 

.062 
.210 
396 
321 


.000 
.000 
.000 
.003 
.018 
.084 
.238 
385 
.272 


.000 
.000 
.000 
001 
.005 
.028 
107 
.260 
.368 
.232 


.90 


.010 
.180 
.810 


.001 
.027 
.243 
729 


.000 
.004 
.049 
.292 
.656 


.000 
.000 
.008 
.073 
328 
-590 


.000 
.000 
.001 
.015 
.098 
.354 
531 


.000 
.000 
.000 
.003 
.023 
124 
372 
.478 


.000 
.000 
.000 
.000 
.005 
.033 
149 
.383 
.430 


.000 
.000 
.000 
.000 
.001 
.007 
.045 
172 
.387 
387 


.95 


.002 
.095 
902 


.000 
.007 
135 
.857 


.000 
.000 
.014 
.171 
815 


.000 
.000 
.001 
021 
.204 
774 


.000 
.000 
.000 
.002 
.031 
.232 
.735 


.000 
.000 
.000 
.000 
.004 
041 
.257 
.698 


.000 
.000 
.000 
.000 
.000 
.005 
.051 
.279 
.663 


.000 
.000 
.000 
.000 
.000 
.001 
.008 
.063 
.299 
.630 


APPENDIX B Table 2—Binomial Distribution AY 


Table 2—Binomial Distribution (continued) 


; 10.15 =.20 2.25 ~=.30)=—.35 ~=.40 )=36.45~—S 50 = 55 S60 S65 S70 S75 S80 S—o85 90S (=o 
.904 .599 .349 .197 .107 .056 .028 .014 .006 .003 .001 .000 .000 .000 .000 .000 .000 .000 .000 .000 
387 .347 .268 .188 .121 .072 .040 .021 .010 .004 .002 .000 .000 .000 .000 .000 .000 .000 
.194 .276 .302 .282 .233 .176 .121 .076 .044 .023 .011 .004 .001 .000 .000 .000 .000 .000 
057 .130 .201 .250 .267 .252 .215 .166 .117  .075 .042 .021 .009 .003 .001 .000 .000 .000 
.011 .040 .088 .146 .200 .238 .251 .238 .205 .160 .111 .069 .037 .016 .006 .001 .000 .000 
.001 .008 .026 .058 .103 .154 .201 .234 .246 .234 .201 .154 .103 .058 .026 .008 .001 .000 
.000 .001 .006 .016 .037 .069 .111 .160 .205 .238 .251 .238 .200 .146 .088 .040 .011 .001 
.000 .000 .001 .003 .009 .021 .042 .075 .117 .166 .215 .252 .267 .250 .201 .130 .057 .010 
.000 .000 .000 .000 .001 .004 .011 .023 .044 .076 .121 .176 .233 .282 .302 .276 .194 .075 
.000 .000 .000 .000 .000 .000 .002 .004 .010 .021 .040 .072 .121 .188 .268 .347 .387 .315 
.000 .000 .000 .000 .000 .000 .000 .000 .001 .003 .006 .014 .028 .056 .107 .197 .349 .599 


895 .569 .314 .167 .086 .042 .020 .009 .004 .001 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 
384 .325 .236 .155 .093 .052 .027 .013 .005 .002 .001 .000 .000 .000 .000 .000 .000 .000 
.213 .287 .295 .258 .200 .140 .089 .051 .027 .013 .005 .002 .001 .000 .000 .000 .000 .000 
.071 .152 .221 .258 .257 .225 .177 .126 .081 .046 .023 .010 .004 .001 .000 .000 .000 .000 
.016 .054 .111. .172 .220 .243 .236 .206 .161 .113 .070 .038 .017 .006 .002 .000 .000 .000 
.002 .013 .039 .080 .132 .183 .221 .236 .226 .193 .147 .099 .057 .027 .010 .002 .000 .000 
.000 .002 .010 .027 .057 .099 .147 .193 .226 .236 .221 .183 .132 .080 .039 .013 .002 .000 
.000 .000 .002 .006 .017 .038 .070 .113 .161 .206 .236 .243 .220 .172 .111 .054 .016 .001 
.000 .000 .000 .001 .004 .010 .023 .046 .081 .126 .177 .225 .257 .258 .221 .152 .071 .014 
.000 .000 .000 .000 .001 .002 .005 .013 .027 .051 .089 .140 .200 .258 .295 .287 .213 .087 
.000 .000 .000 .000 .000 .000 .001 .002 .005 .013 .027 .052 .093 .155 .236 .325 .384 .329 
.000 .000 .000 .000 .000 .000 .000 .000 .000 .001 .004 .009 .020 .042 .086 .167 .314 .569 


886 .540 .282 .142 .069 .032 .014 .006 .002 .001 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 
377 301.206 .127 .071 .037 .017 .008 .003 .001 .000 .000 .000 .000 .000 .000 .000 .000 
.230 .292 .283 .232 .168 .109 .064 .034 .016 .007 .002 .001 .000 .000 .000 .000 .000 .000 
085 .172 .236 .258 .240 .195 .142 .092 .054 .028 .012 .005 .001 .000 .000 .000 .000 .000 
.021 .068 .133 .194 .231 .237 .213 .170 .121 .076 .042 .020 .008 .002 .001 .000 .000 .000 
.004 .019 .053 .103 .158 .204 .227 .223 .193 .149 .101 .059 .029 .011 .003 .001 .000 .000 
.000 .004 .016 .040 .079 .128 .177 .212 .226 .212 .177 .128 .079 .040 .016 .004 .000 .000 
.000 .001 .003 .011 .029 .059 .101 .149 .193 .223 .227 .204 .158 .103 .053 .019 .004 .000 
.000 .000 .001 .002 .008 .020 .042 .076 .121 .170 .213 .237 .231 .194 .133 .068 .021 .002 
.000 .000 .000 .000 .001 .005 .012 .028 .054 .092 .142 .195 .240 .258 .236 .172 .085 .017 
.000 .000 .000 .000 .000 .001 .002 .007 .016 .034 .064 .109 .168 .232 .283 .292 .230 .099 
: : .000 .000 .000 .000 .000 .000 .000 .001 .003 .008 .017 .037 .071 .127 .206 .301 .377 .341 
12 |.000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .001 .002 .006 .014 .032 .069 .142 .282 .540 


15 0.860 .463 .206 .087 .035 .013 .005 .002 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 
343.231 .132 .067 .031 .013 .005 .002 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 
.267 .286 .231 .156 .092 .048 .022 .009 .003 .001 .000 .000 .000 .000 .000 .000 .000 .000 
129.218 .250 .225 .170 .111 .063 .032 .014 .005 .002 .000 .000 .000 .000 .000 .000 .000 
043 .116 .188 .225 .219 .179 .127 .078 .042 .019 .007 .002 .001 .000 .000 .000 .000 .000 
.010 .045 .103 .165 .206 .212 .186 .140 .092 .051 .024 .010 .003 .001 .000 .000 .000 .000 
002 .013 .043 .092 .147 .191 .207 .191 .153 .105 .061 .030 .012 .003 .001 .000 .000 .000 
.000 .003 .014 .039 .081 .132 .177 .201 .196 .165 .118 .071 .035 .013 .003 .001 .000 .000 
.000 .001 .003 .013 .035 .071 .118 .165 .196 .201 .177 .132 .081 .039 .014 .003 .000 .000 
000 .000 .001 .003 .012 .030 .061 .105 .153 .191 .207 .191 .147 .092 .043 .013 .002 .000 
.000 .000 .000 .001 .003 .010 .024 .051 .092 .140 .186 .212 .206 .165 .103 .045 .010 .001 
.000 .000 .000 .000 .001 .002 .007 .019 .042 .078 .127 .179 .219 .225 .188 .116 .043 .005 
.000 .000 .000 .000 .000 .000 .002 .005 .014 .032 .063 .111 .170 .225 .250 .218 .129 .031 
.000 .000 .000 .000 .000 .000 .000 .001 .003 .009 .022 .048 .092 .156 .231 .286 .267 .135 
.000 .000 .000 .000 .000 .000 .000 .000 .000 .002 .005 .013 .031 .067 .132 .231 .343 .366 
.000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .002 .005 .013 .035 .087 .206 .463 


° 
—_ 
=) 
a 


oooooooo0o°o 
oooooooooo 
ooo0oCcooCoOoOs= 
ooooooooow 
ooooooosnN=- 
ooo0o0o0o0o-Cu0uN 


11 


oooo 
oooo 
oounwo 
oaOOW 
Oot ON 
-fNO 


ooo°o 
ooo°QO 
oo0°o 
ooo°o 
ooo°O 
oo0°o 


i=) 
Oo 
Oo 
oO 
Oo 
Oo 


= 
OANODOIRWNH—O SSOwnveanARwna0 COON DABRWN—Olx 
2) 
(=) 
oO 
2) 
=) 
(=) 


(=) 
=) 
=) 
2) 
=) 
(=) 


12 


= ae 
>0 

Oooo ooc oC oOo = 
ooooooooooo 
ooo Ronononononono i 
SCoD0DDDCoOOCOW 
SGSODCDCDOCOOAOR 
DODD DODOONNO= 


ooo-> 
ooOoW 
oowuoo 
oonrwW 
OWWO 
oA O10 


ree ee ee ee Y 
OBWNMNHOOOAONDOBRWN = 
ooooooooo0oo 
ooooooooooo 
oOoOCOCCOCCOCOCCCO0CO 
Seeogagco000 
ooooooooooo 
oOoOOCCOCCOCOCCO0OO- 


Al10 APPENDIX B_ Table 2—Binomial Distribution 


Table 2—Binomial Distribution (continued) 


01.05 .10 .15 .20 25 30 = 2.35 ~=.40 «3.45 50 55 60 65 .70 .75 .80 .85 .90 .95 
851 .440 .185 .074 .028 .010 .003 .001 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 
.138 .371 .329 .210 .113 .053 .023 .009 .003 .001 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 
010 .146 .275 .277 .211 .134 .073 .035 .015 .006 .002 .001 .000 .000 .000 .000 .000 .000 .000 .000 
.000 .036 .142 .229 .246 .208 .146 .089 .047 .022 .009 .003 .001 .000 .000 .000 .000 .000 .000 .000 
.000 .006 .051 .131 .200 .225 .204 .155 .101 .057 .028 .011 .004 .001 .000 .000 .000 .000 .000 .000 
.000 .001 .014 .056 .120 .180 .210 .201 .162 .112 .067 .034 .014 .005 .001 .000 .000 .000 .000 .000 
.000 .000 .003 .018 .055 .110 .165 .198 .198 .168 .122 .075 .039 .017 .006 .001 .000 .000 .000 .000 
.000 .000 .000 .005 .020 .052 .101 .152 .189 .197 .175 .132 .084 .044 .019 .006 .001 .000 .000 .000 
.000 .000 .000 .001 .006 .020 .049 .092 .142 .181 .196 .181 .142 .092 .049 .020 .006 .001 .000 .000 
.000 .000 .000 .000 .001 .006 .019 .044 .084 .132 .175 .197 .189 .152 .101 .052 .020 .005 .000 .000 
107.000 .000 .000 .000 .000 .001 .006 .017 .039 .075 .122 .168 .198 .198 .165 .110 .055 .018 .003 .000 
11 ].000 .000 .000 .000 .000 .000 .001 .005 .014 .034 .067 .112 .162 .201 .210 .180 .120 .056 .014 .001 
12 ].000 .000 .000 .000 .000 .000 .000 .001 .004 .011 .028 .057 .101 .155 .204 .225 .200 .131 .051 .006 
13 |.000 .000 .000 .000 .000 .000 .000 .000 .001 .003 .009 .022 .047 .089 .146 .208 .246 .229 .142 .036 
14].000 .000 .000 .000 .000 .000 .000 .000 .000 .001 .002 .006 .015 .035 .073 .134 .211 .277 .275 .146 
157.000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .001 .003 .009 .023 .053 .113 .210 .329 .371 
16 |.000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .001 .003 .010 .028 .074 .185 .440 
0|.818 .358 .122 .039 .012 .003 .001 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 
1|.165 .377 .270 .137 .058 .021 .007 .002 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 
2).016 .189 .285 .229 .137 .067 .028 .010 .003 .001 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 
3]|.001 .060 .190 .243 .205 .134 .072 .032 .012 .004 .001 .000 .000 .000 .000 .000 .000 .000 .000 .000 
4].000 .013 .090 .182 .218 .190 .130 .074 .035 .014 .005 .001 .000 .000 .000 .000 .000 .000 .000 .000 
5].000 .002 .032 .103 .175 .202 .179 .127 .075 .036 .015 .005 .001 .000 .000 .000 .000 .000 .000 .000 
6 
7 
8 
9 


CONDOARWN=O]x 


20 


000 .000 .009 .045 .109 .169 .192 .171 .124 .075 .036 .015 .005 .001 .000 .000 .000 .000 .000 .000 

.000 .000 .002 .016 .055 .112 .164 .184 .166 .122 .074 .037 .015 .005 .001 .000 .000 .000 .000 .000 

.000 .000 .000 .005 .022 .061 .114 .161 .180 .162 .120 .073 .035 .014 .004 .001 .000 .000 .000 .000 

.000 .000 .000 .001 .007 .027 .065 .116 .160 .177 .160 .119 .071 .034 .012 .003 .000 .000 .000 .000 
107.000 .000 .000 .000 .002 .010 .031 .069 .117 .159 .176 .159 .117 .069 .031 .010 .002 .000 .000 .000 
11 |.000 .000 .000 .000 .000 .003 .012 .034 .071 .119 .160 .177 .160 .116 .065 .027 .007 .001 .000 .000 
12 ].000 .000 .000 .000 .000 .001 .004 .014 .035 .073 .120 .162 .180 .161 .114 .061 .022 .005 .000 .000 
13 |.000 .000 .000 .000 .000 .000 .001 .005 .015 .037 .074 .122 .166 .184 .164 .112 .055 .016 .002 .000 
14].000 .000 .000 .000 .000 .000 .000 .001 .005 .015 .037 .075 .124 .171 .192 .169 .109 .045 .009 .000 
15 ].000 .000 .000 .000 .000 .000 .000 .000 .001 .005 .015 .036 .075 .127 .179 .202 .175 .103 .032 .002 
16 ]}.000 .000 .000 .000 .000 .000 .000 .000 .000 .001 .005 .014 .035 .074 .130 .190 .218 .182 .090 .013 
17 |.000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .001 .004 .012 .032 .072 .134 .205 .243 .190 .060 
18 |.000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .001 .003 .010 .028 .067 .137 .229 .285 .189 
19 ].000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .002 .007 .021 .058 .137 .270 .377 
20}|.000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .001 .003 .012 .039 .122 .358 


APPENDIX B_ Table 3—Poisson Distribution All 


Table 3—Poisson Distribution 


pe 
x 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 
0 9048 .8187  .7408 .6703.  .6005 5488 .4966  .4493 4066 3679 
1 0905 .1637 .2222 .2681 .3033  .3293 .3476 .3595 .3659 .3679 
2 0045 .0164 .0333 .0536 .0758 .0988 .1217 .1438 .1647  .1839 
3 0002 .0011 .0033  .0072 .0126 .0198 .0284 .0383 .0494 .0613 
4 0000 .0001 .0003 .0007 .0016 .0030 .0050 .0077 .0111 0153 
5 0000 .0000 .0000 .0001 .0002 .0004 .0007 .0012 .0020 .0031 
6 0000 .0000 .0000 .0000 .0000 .0000 .0001 .0002 .0003 .0005 
7 0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000  .0001 
p 
x 1.1 1.2 1.3 1.4 15 1.6 1.7 1.8 1.9 2.0 
0 (3329. .3012.—~«.2725.~«.2466~=~=«2231~=«.2019~=S—«d827~=«d1653~SC«i«G.~—~C«~d‘S 
1 3662 .3614 .3543 3452 .3347 .3230 3106 .2975 .2842 .2707 
a 2014 .2169 +.2303 «2417'S .2510-- «.2584.-—Ss «.2640-~=s .2678 = .2700_~—=«.27707 
3 0738 .0867 .0998 .1128 .1255 .1378 .1496 .1607 .1710 .1804 
4 0203 .0260 .0324 .0395 .0471 .0551 .0636 .0723 .0812  .0902 
5 0045 .0062 .0084 0111 .0141 .0176 .0216 .0260 .0309 .0361 
6 0008 .0012 .0018 .0026 .0035 .0047 .0061 .0078 .0098 .0120 
i) 0001 .0002 .0003 .0005 .0008 .0011 .0015 .0020 .0027 .0034 
8 0000 .0000 .0001 .0001 .0001 .0002 .0003 .0005 .0006 .0009 
9 0000 .0000 .0000 .0000 .0000 .0000 .0001 .0001 .0001 0002 
pe 
x 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 
0 1225. .1108  +.1003. + +«.0907. -«.0821 + .0743 + .0672 .0608 .0550 .0498 
1 2572 .2438 «=.2306-=S (2177s «.2052,s«.1931.—Sss«.1815.S«.1703-Ss«.1596 = .1494 
2 2700 .2681 .2652 .2613 .2565 .2510 .2450 .2384 .2314 .2240 
3 1890 .1966 .2033 .2090 .2138 .2176 .2205 .2225 .2237  .2240 
4 0992 .1082 .1169 1254 .1336 .1414 1488 .1557  .1622 .1680 
5 0417 .0476 .0538 .0602 .0668 .0735 .0804 .0872 .0940 .1008 
6 0146 .0174 .0206 .0241 .0278 .0319 .0362 .0407 .0455 .0504 
i 0044 .0055 .0068 .0083 .0099 .0118 .0139 .0163 .0188  .0216 
8 0011 .0015 .0019 .0025 .0031 .0038 .0047 .0057 .0068 .0081 
9 0003. .0004 .0005 .0007 .0009 .0011 .0014 .0018 .0022 .0027 


10 .0001 .0001 .0001 .0002 .0002 .0003 .0004 .0005 .0006 .0008 
11 .0000 .0000 .0000 .0000 .0000 .0001 .0001 .0001 .0002 .0002 
12 0000  .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0001 


x 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0 
0 .0450 .0408 .0369 .0334 .0302 .0273 .0247 .0224 .0202 = .0183 
1 1397, .13804 1217, «1185S «1057S «0984 = 0915 »=.0850) = .0789' Ss 0733 
2 2165 .2087 .2008 .1929  .1850 .1771 .1692 .1615 .1539 .1465 
3 2237] 2226 e220 OF 2156 e250 eee 22D 208 72046 200/54. 
4 1734 =.1781 = .1823, «1858 = «618881912, «1931 «1944 = 1951S. 1954 
5 WO7S le) AO) ASS RZ Ii AS) IAT? IIIB} 
6 0555 .0608 .0662 .0716 .0771 .0826 .0881 .0936 .0989 .1042 
7 0246 .0278 .0312 .0348 .0385 .0425 .0466 .0508 .0551 .0595 
8 0095 ~—«.0111 .0129 .0148 .0169 ~ .0191 0215 .0241 .0269 .0298 
9 .0033. .0040 .0047 .0056 .0066 .0076 .0089 .0102 .0116 .0132 
10 .0010 .0013 .0016 .0019 .0023 .0028 .0033 .0039 .0045 .0053 
11 .0003 .0004 .0005 .0006 .0007 .0009  .0011 .0013.  .0016 = .0019 
12 .0001 .0001 0001 .0002 .0002 .0003 .0003 .0004 .0005 .0006 
13 .0000 .0000 .0000 .0000 .0001 .0001 # .0001 .0001 .0002 .0002 
14 0000  .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 4.0001 


W. H. Beyer, Handbook of Tables for Probability and Statistics, 2e, CRC Press, 
Boca Raton, Florida, 1986. 


Al2 


APPENDIX B Table 3—Poisson Distribution 


Table 3—Poisson Distribution (continued) 


OOAONDOABRWND — OLX 


4.1 
.0166 
.0679 
.1393 
.1904 
.1951 
-1600 
-1093 
.0640 
.0328 
.0150 
.0061 
.0023 
.0008 
.0002 
.0001 
.0000 


5.1 
.0061 
.0311 
.0793 
.1348 
.1719 
.1753 
.1490 
1086 
.0692 
.0392 
.0200 
.0093 
.0039 
.0015 
.0006 
.0002 
.0001 
.0000 


4.2 


.0150 
.0630 
1323 
.1852 
.1944 
.1633 
1143 
.0686 
.0360 
.0168 
.0071 
.0027 
.0009 
.0003 
.0001 
.0000 


5.2 


.0055 
.0287 
.0746 
1293 
1681 
.1748 
.1515 
1125 
.0731 
.0423 
.0220 
.0104 
.0045 
.0018 
.0007 
.0002 
.0001 
.0000 


4.3 


.0136 
.0583 
1254 
.1798 
-1933 
-1662 
.1191 

.0732 
.0393 
.0188 
.0081 
.0032 
.0011 

.0004 
.0001 

.0000 


5.3 


.0050 
.0265 
.0701 
1239 
-1641 
.1740 
.1537 
efile! 
.0771 
.0454 
.0241 
.0116 
.0051 
.0021 
.0008 
.0003 
.0001 
.0000 


4.4 


.0123 
.0540 
1188 
1743 
1917 
.1687 
1237 
.0778 
.0428 
.0209 
.0092 
.0037 
.0014 
.0005 
.0001 
.0000 


5.4 


.0045 
.0244 
.0659 
1185 
-1600 
.1728 
.1555 
.1200 
.0810 
.0486 
.0262 
.0129 
.0058 
.0024 
.0009 
.0003 
.0001 
.0000 


4.5 


.0111 

.0500 
1125 
.1687 
.1898 
.1708 
1281 
.0824 
.0463 
.0232 
.0104 
.0043 
.0016 
.0006 
.0002 
.0001 


5.5 


.0041 
0225 
.0618 
nls} 
.1558 
.1714 
.1571 
.1234 
.0849 
.0519 
.0285 
.0143 
.0065 
.0028 
-0011 

.0004 
.0001 

.0000 


y 


4.6 


0101 
.0462 
.1063 
1631 
.1875 
m5 
1323 
.0869 
.0500 
.0255 
.0118 
.0049 
.0019 
.0007 
.0002 
.0001 


5.6 


.0037 
.0207 
.0580 
.1082 
-1515 
.1697 
.1584 
.1267 
.0887 
.0552 
.0309 
.0157 
.0073 
.0032 
.0013 
.0005 
.0002 
.0000 


4.7 


.0091 
.0427 
.1005 
.1574 
-1849 
.1738 
1362 
.0914 
.0537 
.0280 
.0132 
.0056 
.0022 
.0008 
.0003 
.0001 


5.7 


.0033 
.0191 
.0544 
.1033 
1472 
.1678 
.1594 
.1298 
.0925 
.0586 
.0334 
.0173 
.0082 
.0036 
.0015 
.0006 
.0002 
.0001 


4.8 


.0082 
.0395 
.0948 
.1517 
.1820 
1747 
.1398 
.0959 
.0575 
.0307 
.0147 
.0064 
.0026 
.0009 
.0003 
.0001 


5.8 


.0030 
.0176 
.0509 
.0985 
1428 
.1656 
1601 
71326 
.0962 
.0620 
.0359 
.0190 
.0092 
.0041 
.0017 
.0007 
.0002 
.0001 


4.9 


.0074 
.0365 
.0894 
.1460 
.1789 
oS 
1432 
.1002 
.0614 
.0334 
.0164 
.0073 
.0030 
.0011 

.0004 
.0001 


5.9 


.0027 
.0162 
.0477 
.0938 
1383 
.1632 
-1605 
SoS 
.0998 
.0654 
.0386 
.0207 
.0102 
.0046 
.0019 
.0008 
.0003 
.0001 


5.0 


.0067 
.0337 
.0842 
.1404 
.1755 
IZo5) 
.1462 
.1044 
.0653 
.0363 
.0181 
.0082 
.0034 
.0013 
.0005 
.0002 


6.0 


.0025 
.0149 
.0446 
.0892 
1339 
.1606 
.1606 
1377 
1033 
.0688 
.0413 
.0225 
.0113 
.0052 
.0022 
.0009 
.0003 
.0001 


APPENDIX B_ Table 3—Poisson Distribution Al3 


Table 3—Poisson Distribution (continued) 


BE 
6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 7.0 
.0022 = .0020 .0018 .0017. .0015 .0014 .0012 ~~ .0011 .0010  .0009 
0137. .0126 = .0116_—s««.0106=—S («0098 )=.0090 §=—.0082 Ss «0076 =.0070 = «.0064 
0417. +.0390 §=.0364 .0340 .0318 .0296 .0276 .0258 .0240 .0223 
.0848 .0806 .0765 .0726 .0688 .0652 .0617 .0584 .0552 .0521 
1294 = .1249 = =.1205 ~—«.1162 .1118  .1076 = .1034 .0992 .0952 .0912 
.1579 =.1549 £1519 «1487 «1454 1420) 1385 = .1349 13141277 
.1605 .1601 1595 .1586 .1575 .1562 .1546 .1529  .1511 -1490 
1399 = .1418 = .1435 .1450 .1462 .1472 .1480 .1486 .1489 .1490 
1066 .1099 .1130 .1160 .1188 .1215 .1240 .1263  .1284 .1304 
0723, .0757  =.0791 + .0825 .0858 .0891 .0923 .0954 .0985 .1014 
10 0441 .0469 .0498 .0528 .0558 .0588 .0618 .0649 .0679 ~ .0710 
11 0245 .0265 .0285 .0307 .0330 .0353 .0377 .0401 .0426 .0452 
12 0124 = =.0137. =.0150 +=.0164 0179 .0194 .0210 .0227 .0245 .0264 
13 .0058 .0065 .0073 .0081 .0089 .0098 .0108 .0119 .0130 .0142 
14 .0025 .0029 .0033 .0037 .0041 .0046 .0052 .0058 .0064 = .0071 
U6) .0010 .0012 .0014 .0016 .0018 .0020 .0023 .0026 .0029 .0033 
16 .0004 .0005 .0005 .0006 .0007 .0008 .0010 ~~ .0011 0013 =.0014 
7 .0001 .0002 .0002 .0002 .0003 .0003 .0004 .0004 .0005 .0006 
18 .0000 ~ .0001 .0001 .0001 .0001 .0001 .0001 .0002 .0002 .0002 
19 .0000  .0000 .0000 .0000 .0000 .0000 .0000  .0001 .0001 .0001 


OANDOAABRWDND — OLX 


x 7.1 7.2 73 74 75 76 7.7 78 79 8.0 

0 .0008 .0007  .0007 .0006 .0006 .0005 .0005 .0004 .0004 .0003 
1 .0059 .0054 .0049 .0045 .0041 .0038 .0035 .0032 .0029 .0027 
2 .0208 .0194 .0180 .0167 .0156 .0145 .0134 .0125 .0116 = .0107 
3 0492 .0464 .0438 .0413 .0389 .0366 .0345 .0324 .0305 .0286 
4 .0874 .0836 .0799 .0764 .0729 .0696 .0663 .0632 .0602 .0573 
5 1241 .1204 .1167. =.1130)—.1094. Ss «.1057-—- «1021 = «.0986 ~=.0951—Ss_ 0916 
6 1468 .1445 .1420 .1394 .1367 .1339 = .1311 1282 9.1252 1221 
7 1489 .1486 .1481 .1474 .1465 .1454 .1442 .1428 .1413 .1396 
8 1321 1337) 1351 1363) 61373) 1382) £1388) £1392. .1395 =~. 1396 
9 1042, .1070 .1096 .1121 1144 .1167. 1187) .1207,— 1224.12.41 
10 .0740 .0770 .0800 .0829 .0858 .0887 .0914 .0941 .0967 .0993 
11 .0478 .0504 .0531 .0558 .0585 .0613 .0640 .0667 .0695 .0722 
12 .0283 .0303 .0323 .0344 .0366 .0388 .0411 .0434 .0457 .0481 
13 .0154 .0168 .0181 0196 =.0211 0227 =—.0243, 0260 = .0278 ~=—.0296 
14 .0078 .0086 .0095 .0104 .0113 .0123 .0134 .0145 .0157 .0169 
15 .0037. .0041 .0046 .0051 .0057 .0062 .0069 .0075 .0083 .0090 
16 0016 .0019 .0021 .0024 .0026 .0030 .0033 .0037 .0041 .0045 
7 .0007. + .0008 .0009 .0010 .0012 .0013 .0015 .0017 .0019 .0021 
18 .0003. .0003 .0004 .0004 .0005 .0006 .0006 .0007 .0008 .0009 
19 .0001 .0001 .0001 .0002 .0002 .0002 .0003 .0003 .0003 .0004 
20 .0000 .0000  .0001 .0001 .0001 .0001 .0001 ~ .0001 0001 .0002 
21 .0000  .0000 .0000 .0000 .0000 .0000 .0000 .0000  .0001 .0001 


Al4 


APPENDIX B Table 3—Poisson Distribution 


Table 3—Poisson Distribution (continued) 


OOAONDOABRWND — OLX 


RON SSGHBUSHRGNSSOCOCVAARWN =O} 


8.1 
.0003 
.0025 
.0100 
.0269 
.0544 
.0882 
1191 
.1378 
.1395 
.1256 
1017 
.0749 
.0505 
.0315 
.0182 
.0098 
.0050 
.0024 
.0011 
.0005 
.0002 
.0001 
-0000 


9.1 
.0001 
.0010 
.0046 
.0140 
.0319 
.0581 
.0881 
1145 
.1302 
olkSiilZ/ 
.1198 
.0991 
.0752 
.0526 
.0342 
.0208 
.0118 
.0063 
.0032 
.0015 
.0007 
.0003 
.0001 
.0000 
.0000 


8.2 


.0003 
.0023 
.0092 
.0252 
.0517 
.0849 
1160 
.1358 
1392 
269 
-1040 
.0776 
.0530 
.0334 
.0196 
.0107 
.0055 
.0026 
.0012 
.0005 
.0002 
.0001 
.0000 


9.2 


.0001 
.0009 
.0043 
.0131 
.0302 
.0555 
.0851 
1118 
1286 
el Sill) 
.1210 
1012 
.0776 
.0549 
.0361 
0221 
.0127 
.0069 
.0035 
.0017 
.0008 
.0003 
.0001 
.0001 
.0000 


8.3 


.0002 
.0021 
.0086 
.0237 
.0491 
.0816 
.1128 
1338 
.1388 
.1280 
-1063 
.0802 
.0555 
.0354 
.0210 
.0116 
.0060 
.0029 
.0014 
.0006 
.0002 
.0001 
-0000 


9.3 


.0001 

.0009 
.0040 
.0123 
.0285 
.0530 
.0822 
1091 

-1269 
1311 

1219 
1031 

.0799 
.0572 
.0380 
.0235 
.0137 
.0075 
.0039 
.0019 
.0009 
.0004 
.0002 
.0001 

-0000 


8.4 


.0002 
.0019 
.0079 
0222 
.0466 
.0784 
-1097 
a Siz 
1382 
-1290 
1084 
.0828 
.0579 
.0374 
0225 
.0126 
.0066 
.0033 
.0015 
.0007 
.0003 
.0001 
.0000 


9.4 


.0001 
.0008 
.0037 
.0115 
.0269 
.0506 
.0793 
-1064 
1251 
1306 
1228 
-1049 
.0822 
.0594 
.0399 
.0250 
.0147 
.0081 
.0042 
.0021 
.0010 
.0004 
.0002 
.0001 
-0000 


8.5 


.0002 
.0017 
.0074 
.0208 
.0443 
BOWE? 
.1066 
.1294 
.1375 
.1299 
.1104 
.0853 
.0604 
.0395 
.0240 
.0136 
.0072 
.0036 
.0017 
.0008 
.0003 
.0001 
.0001 


9.5 


.0001 

.0007 
.0034 
.0107 
.0254 
.0483 
.0764 
.1037 
1232 
.1300 
.1235 
.1067 
.0844 
.0617 
.0419 
.0265 
.0157 
.0088 
.0046 
.0023 
.0011 

.0005 
.0002 
.0001 

.0000 


B 


8.6 


.0002 
.0016 
.0068 
.0195 
.0420 
.0722 
1034 
1271 
.1366 
.1306 
.1123 
.0878 
.0629 
.0416 
.0256 
.0147 
.0079 
.0040 
.0019 
.0009 
.0004 
.0002 
.0001 


9.6 


.0001 
.0007 
.0031 
.0100 
.0240 
.0460 
.0736 
1010 
1212 
.1293 
1241 
.1083 
.0866 
.0640 
.0439 
.0281 
.0168 
.0095 
.0051 
.0026 
.0012 
.0006 
.0002 
.0001 
.0000 


8.7 


.0002 
.0014 
.0063 
.0183 
.0398 
.0692 
.1003 
.1247 
.1356 
ole 
.1140 
.0902 
.0654 
.0438 
.0272 
.0158 
.0086 
.0044 
.0021 
.0010 
.0004 
.0002 
.0001 


9.7 


.0001 
.0006 
.0029 
.0093 
.0226 
.0439 
.0709 
.0982 
1191 
1284 
.1245 
.1098 
.0888 
.0662 
.0459 
.0297 
.0180 
.0103 
.0055 
.0028 
.0014 
.0006 
.0003 
.0001 
.0000 


8.8 


.0002 
.0013 
.0058 
.0171 
.0377 
.0663 
.0972 
1222 
1344 
alii 
1157 
.0925 
.0679 
.0459 
.0289 
.0169 
.0093 
.0048 
.0024 
.0011 
.0005 
.0002 
.0001 


9.8 


.0001 
.0005 
.0027 
.0087 
.0213 
.0418 
.0682 
.0955 
1170 
1274 
.1249 
1112 
.0908 
.0685 
.0479 
.0313 
.0192 
.0111 

.0060 
.0031 
.0015 
.0007 
.0003 
.0001 
.0001 


8.9 


-0001 

.0012 
.0054 
.0160 
.0357 
.0635 
.0941 
SE 
1332 
1317 
1172 
.0948 
.0703 
.0481 
.0306 
.0182 
-0101 

.0053 
.0026 
.0012 
.0005 
.0002 
-0001 


9.9 


.0001 
.0005 
.0025 
.0081 
.0201 
.0398 
.0656 
.0928 
1148 
1263 
.1250 
1125 
.0928 
.0707 
-0500 
.0330 
.0204 
.0119 
.0065 
.0034 
.0017 
.0008 
.0004 
.0002 
-0001 


9.0 


.0001 

.0011 

.0050 
.0150 
.0337 
.0607 
.0911 

.1171 

.1318 
.1318 
.1186 
.0970 
.0728 
.0504 
.0324 
.0194 
.0109 
.0058 
.0029 
.0014 
.0006 
.0003 
.0001 


10.0 


.0000 
.0005 
.0023 
.0076 
.0189 
.0378 
.0631 
.0901 
.1126 
1251 
1251 
1137 
.0948 
LOW29) 
.0521 
.0347 
.0217 
.0128 
.0071 
.0037 
.0019 
.0009 
.0004 
.0002 
-0001 


APPENDIX B_ Table 3—Poisson Distribution Al5 


Table 3—Poisson Distribution (continued) 


B 
x 11 12 13 14 15 16 17 18 19 20 

0 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 # .0000 
1 .0002. + .0001 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 
2 0010 .0004 .0002 .0001 .0000 #.0000 .0000 .0000 .0000 8.0000 
3 .0037. .0018 .0008 .0004 .0002 .0001 .0000 .0000 .0000 .0000 
4 0102 0053. = .0027 0013 0006 0003 0001 .0001 .0000 .0000 
5 0224 .0127. .0070 .0037 .0019 .0010 .0005 .0002 + .0001 .0001 
6 0411 .0255 .0152 .0087 .0048 .0026 .0014 .0007 .0004 .0002 
7 .0646 .0437 .0281 0174 .0104 .0060 .0034 .0018 .0010 .0005 
8 .0888 .0655 .0457 .0304 .0194 .0120 .0072 .0042 .0024 = .0013 
9 1085 .0874 .0661 .0473 .0324 .0213 .0135 .0083 .0050 .0029 


10 .1194 .1048 .0859 .0663 .0486 .0341 .0230 .0150 .0095 .0058 
11 1194 .1144 .1015 .0844 .0663 .0496 .0355 .0245 .0164 #.0106 
12 1094 + .1144 .1099 .0984 .0829 .0661 .0504 .0368 .0259 .0176 
13 .0926 .1056 .1099 .1060 .0956 .0814 .0658 .0509 .0378 .0271 
14 .0728 .0905 .1021 .1060 .1024 .0930 .0800 .0655 .0514 .0387 
15 .0534 .0724 .0885 .0989 .1024 .0992 .0906 .0786 .0650 .0516 
16 .0367. .0543 .0719 .0866 .0960 .0992 .0963 .0884 .0772 .0646 
17 .0237 .0383 =.0550 =.0713 =«.0847'-—Ss «0934 = .0963)=Ss 0936 = «0863 ~—.0760 
18 0145 .0256 .0397 .0554 .0706 .0830 .0909 .0936 = .0911 .0844 
19 .0084 .0161 .0272 .0409 .0557 .0699 .0814 .0887 .0911 .0888 
20 .0046  .0097  .0177 .0286 .0418 .0559 .0692 .0798 .0866  .0888 


x 11 12 13 14 15 16 17 18 19 20 

21 .0024 .0055 .0109 .0191 .0299 .0426 .0560 .0684 .0783 .0846 
22. .0012. =.0030 .0065 .0121 .0204 .0310 .0433 .0560 .0676 .0769 
23 .0006 .0016 .0037 .0074 .0133 .0216 .0320 .0438 .0559 .0669 
24 .0003. .0008 .0020 .0043 .0083 .0144 0226 .0328 .0442 .0557 
25 .0001 .0004 .0010 .0024 .0050 .0092 .0154 .0237 .0336 .0446 
26 .0000 .0002 .0005 .0013 .0029 .0057 ~ .0101 0164 .0246 .0343 
27 .0000 .0001 .0002 .0007 .0016 .0034 .0063 .0109 .0173 = .0254 
28 .0000 .0000 .0001 .0003 .0009 .0019 .0038 .0070 .0117 ~~ .0181 
29 .0000 .0000 .0001 .0002 .0004 0011 .0023 .0044 .0077 = .0125 
30 .0000 .0000 .0000 .0001 .0002 .0006 .0013 .0026 .0049 .0083 
31 0000 .0000 .0000 .0000 .0001 .0003 .0007 «.0015 .0030 # .0054 
32 .0000 .0000 .0000 .0000 .0001 .0001 .0004 .0009 .0018 .0034 
33 0000 .0000 .0000 .0000 .0000 .0001 .0002 ~«.0005 =.0010 .0020 
34 .0000 .0000 .0000 .0000 .0000 .0000 .0001 .0002 .0006 .0012 
35 0000 .0000 .0000 .0000 .0000 .0000 .0000 .0001 0003 # .0007 
36 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0001 .0002 .0004 
37 0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0001 #.0002 
38 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 ~~ .0001 
39 0000  .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 ~.0001 


Al6é APPENDIX B_ Table 4—Standard Normal Distribution 


Table 4—Standard Normal Distribution 


.09 .08 .07 .06 .05 .04 .03 .02 .01 .00 
.0002. =.0003 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0003 
.0003. .0004 .0004 .0004 .0004 .0004 .0004 .0005 .0005 .0005 
0005 .0005 .0005 .0006 .0006 .0006 .0006 .0006 .0007 #.0007 
.0007. .0007 .0008 .0008 .0008 .0008 .0009 .0009 .0009 .0010 
.0010 .0010 ~ .0011 .0011 .0011 0012) = .0012 = .0013 = .0013 ~=.0013 
0014 =.0014 =.0015 =.0015 .0016 .0016 .0017 .0018 .0018 .0019 
.0019 .0020 .0021 .0021 .0022 .0023 .0023 .0024 .0025 .0026 
.0026 .0027 .0028 .0029 .0030 .0031 .0032 .0033 .0034 .0035 
.0036 .0037 .0038 .0039 .0040 .0041 .0043 .0044 .0045 .0047 
.0048 .0049 .0051 .0052 .0054 .0055 .0057 .0059 .0060 .0062 
.0064 .0066 .0068 .0069 .0071 .0073 .0075 .0078 .0080 .0082 
.0084 .0087 .0089 .0091 .0094 .0096 .0099 .0102 .0104 + .0107 
0110 8.0113. .0116 =.0119- 0122, «0125 0129S 0132) 0136 ~—«.0139 
0143 .0146 .0150 .0154 .0158 .0162 .0166 .0170 .0174 ~ .0179 
0183 =.0188 »=6.0192. Ss .0197. Ss .0202.)—Ss- «0207S 0212) .0217') = .0222)—— .0228 
0233 .0239 .0244 .0250 .0256 .0262 .0268 .0274 .0281 .0287 
0294 .0301 .0307 .0314 .0322 .0329 .0336 .0344 .0351 .0359 
.0367 .0375 .0384 .0392 .0401 .0409 .0418 .0427 .0436 .0446 
0455 .0465 .0475 .0485 .0495 .0505 .0516 .0526 .0537 .0548 
.0559 .0571 + .0582 .0594 .0606 .0618 .0630 .0643 .0655  .0668 
.0681 .0694 .0708 .0721 .0735 .0749 .0764 .0778 .0793 .0808 
.0823 .0838 .0853 .0869 .0885 .0901 .0918 .0934 .0951 .0968 
.0985 = .1003 .1020 .1038 .1056 .1075 .1093 ~~ .1112 1131 1151 
el7) NS) 0) 2800) IAS) 7A IAS ISS) ISIS 7/ 
.1379 =.1401. 1423. «1446S «1469s «1492, 1515 = .1539 «61562 = .1587 
OM OSSn COO GSS di SO 62 ICS SI 64 I 
.1867 .1894 =—.1922, 1949. 1977, Ss .2005-)—Ss «.2033)s «.2061 = .2090-— 2119 
2148 2177) 2206) 2236) 2266S 2296) 2327) 2358 = 2389S .2420 
2451 =.2483) 2514 2546 = .2578 = 2611 Ss 2643) 2676 = 2709 ~~ 2743 
2776 =©.2810 = .2843.) 2877) 2912) 2946) = 2981S 3015 = .3050 ~—.3085 
3121 3156 =.3192) 3228) .3264_)—s 3300) Ss 3336 = 3372) = 3409 ~—s «3446 
P0483) 30203057 3004 er S632 S009 3/07 No 4 5S SS eZ 
3859 §=.3897) 3936) 3974 ~—Ss 4013 Ss 4052S 4090) = 4129S 4168 = 4207 
4247 4286 .4325 .4364 4404 .4443 .4483 .4522 .4562 .4602 
4641 .4681  .4721 _+.4761_ ~_~—.4801_ ~==.4840 4880 .4920 .4960  .5000 


0.80 
0.90 
0.95 
0.99 


Table A-3, pp. 681— 682 from Probability and Statistics for Engineers and Scientists, 6e 
by Walpole, Myers, and Myers. Copyright 1997. Pearson Prentice Hall, Upper Saddle 
River, N.J. 


APPENDIX B_ Table 4—Standard Normal Distribution Al7 


Table 4—Standard Normal Distribution (continued) 


z .01 02 .03 .04 .05 .06 .07 .08 .09 

0.0 5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359 
0.1 $43 800 4/ Ome Oil 0 DD/ DOI ONE OOSO NEE DO ONE DIA OOS 
0.2 5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103  .6141 
0.3 6217. 6255 = .6293 .6331 .6368 .6406 .6443 6480 .6517 
0.4 .6591 .6628 .6664 .6700 .6736 .6772 + £4.6808 .6844 ~~ .6879 
0.5 (6950 GOS Deer 01 S054 aa OS See 23 eel OY ee O24. 
0.6 7291 = .7324 = 7357) 6738967422 .7454 = 7486S 7517 ~— «7549 
0.7 7611 = .7642,)— £7673) .7704— 773477647794 7823 £7852 
0.8 7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106  .8133 
0.9 8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389 
1.0 8438 .8461 .8485 .8508 .8531 .8554 .8577 .8599 8621 
1.1 8665 .8686 .8708 .8729 .8749 .8770  .8790  .8810  .8830 
1.2 .8869 .8888 .8907 .8925 .8944 .8962 .8980 .8997 .9015 
1.3 9049 er COCCI 00325 9099)eee- Oil Seo slo 47a Ol OZ mee O77, 
1.4 9207 =.9222 = .9236 = 9251s 9265 )3=.9279 §=.9292 Ss 9306 ~—-.9319 
1.5 9345 .9357 .9370 .9382 .9394 .9406 .9418 .9429 .9441 
1.6 9463 .9474 .9484 .9495 .9505 .9515 .9525 .9535 .9545 
1.7 eis! Sos) ey, Sel RERE) Cielo} isi) Rey) aici} 
1.8 9649 .9656 .9664 .9671 .9678 .9686 .9693 .9699 .9706 
1.9 SG) WAS 2 Bet) yee) Sie) BWA SS SYKS7/ 
2.0 9778 .9783 .9788 .9793 .9798 .9803 .9808 .9812 .9817 
2m .9826 .9830 .9834 .9838 .9842 .9846 .9850 .9854 .9857 
2.2 .9864 .9868 .9871 .9875 .9878 .9881 .9884 .9887 .9890 
2.3 SERS  acists) BED Seo4h fetes) seeyol)  Seyiil RMI} RSI 
2.4 9920 .9922 .9925 .9927 .9929 .9931 .9932 .9934 .9936 
2.5 9940 .9941 .9943 .9945 .9946 .9948 .9949 .9951 .9952 
2.6 9955 .9956 .9957 .9959 .9960 .9961 .9962 .9963 .9964 
2.7 Les) EY RE} SRS) =A) SEBEL! 
2.8 9975 .9976 .9977 .9977 .9978 .9979 .9979 .9980  .9981 
2.9 19982 OOCZ MEE OU OS mr OOC4 EOC OAM OCONEE OCC ONES UCOMMEEIOSO 
3.0 .9987 .9987 .9988 .9988 .9989 .9989 .9989 .9990 .9990 
3.1 Sk £28 See £262 LEER Leer SeEe Lees SERS 
3.2 9993 .9994 .9994 .9994 .9994 .9994 .9995 .9995 .9995 


SeES SEED ERS GEES SES SERS BERS SEQG Sei 
9997 .9997  .9997 .9997 .9997 .9997 .9997 .9997 .9998 


Al8& 


APPENDIX B_ Table 5—t-Distribution 


Table 5—?-Distribution 


a 
nh 


OONDOBRWHND —!: 


90 
100 
500 

1000 


oe) 


Level of 
confidence, c 
One tail, a 
Two tails, a 


0.80 
0.10 
0.20 
3.078 
1.886 
1.638 
533 
1.476 
1.440 
1.415 
1.397 
1.383 
Ls 
1.363 
1.356 
1.350 
1.345 
1.341 
1.337 
1.333 
1.330 
1.328 
1.325 
1.323 
324 
1.319 
1.318 
1.316 
1.315 
1.314 
1.313 
1.311 
1.310 
1.309 
1.309 
1.308 
1.307 
1.306 
1.306 
1.305 
1.304 
1.304 
1.303 
1.301 
299 
1.296 
1.294 
1.292 
i291 
1.290 
1.283 
1.282 
1.282 


0.90 
0.05 
0.10 
6.314 
2.920 
2.353 
22 
2.015 
1.943 
1.895 
1.860 
1.833 
1.812 
1.796 
1.782 
1.771 
1.761 
1.753 
1.746 
1.740 
1.734 
1.729 
e725 
1.721 
bl 
1.714 
1.711 
1.708 
1.706 
1.703 
1.701 
1.699 
1.697 
1.696 
1.694 
1.692 
1.691 
1.690 
1.688 
1.687 
1.686 
1.685 
1.684 
1.679 
1.676 
1.671 
1.667 
1.664 
1.662 
1.660 
1.648 
1.646 
1.645 


0.95 
0.025 
0.05 
12.706 
4.303 
3.182 
2.776 
2.571 
2.447 
2.365 
2.306 
2.262 
2.228 
2.201 
2.179 
2.160 
2.145 
2.131 
2.120 
2.110 
2.101 
2.093 
2.086 
2.080 
2.074 
2.069 
2.064 
2.060 
2.056 
2.052 
2.048 
2.045 
2.042 
2.040 
2.037 
2.035 
2.032 
2.030 
2.028 
2.026 
2.024 
2.023 
2.021 
2.014 
2.009 
2.000 
1.994 
1.990 
1.987 
1.984 
1.965 
1.962 
1.960 


0.98 
0.01 
0.02 
31.821 
6.965 
4.541 
3.747 
3.365 
3.143 
2.998 
2.896 
2.821 
2.764 
2.718 
2.681 
2.650 
2.624 
2.602 
2.583 
2.567 
2.552 
2.539 
2.528 
2.518 
2.508 
2.500 
2.492 
2.485 
2.479 
2.473 
2.467 
2.462 
2.457 
2.453 
2.449 
2.445 
2.441 
2.438 
2.434 
2.431 
2.429 
2.426 
2.423 
2.412 
2.403 
2.390 
2.381 
2.374 
2.368 
2.364 
2.334 
2.330 
2.326 


The critical values in Table 5 were generated using Excel. 


0.99 
0.005 
0.01 
63.657 
91925 
5.841 
4.604 
4.032 
3.707 
3.499 
3.355 
3.250 
SHl69 
3.106 
3.055 
3.012 
2.977 
2.947 
22924 
2.898 
2.878 
2.861 
2.845 
2.831 
2.819 
2.807 
2a 
2.787 
229 
2.771 
2.763 
2.756 
2.750 
2.744 
2.738 
2.733 
2.728 
2.724 
2.719 
2.715 
PAN 
2.708 
2.704 
2.690 
2.678 
2.660 
2.648 
2.639 
2.632 
2.626 
2.586 
2.581 
2.576 


c-confidence interval 


=f 


Left-tailed test 


Two-tailed test 


APPENDIX B_ Table 6—Chi-Square Distribution Al9 


Table 6—Chi-Square Distribution 


a GG Xp 
Right tail Two tails 


Degrees of a 

freedom 0.995 0.99 0.975 0.95 0.90 0.10 0.05 0.025 0.01 0.005 

_ _ 0.001 0.004 0.016 2.706 3.841 5.024 6.635 7.879 
2 0.010 0.020 0.051 0.103 0.211 4.605 5.991 7.378 9.210 10.597 
3 0.072 0.115 0.216 0.352 0.584 6.251 7.815 9.348 11.345 12.838 
4 0.207 0.297 0.484 0.711 1.064 7.779 9.488 11.143 R277) 14.860 
5 0.412 0.554 0.831 1.145 1.610 9.236 11.071 12.833 15.086 16.750 
6 0.676 0.872 1.237 1.635 2204 101645) 122592 14.449 16.812 18.548 
7 0.989 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475 20.278 
8 1344 1.646 2.180 2.733 3.490 13.362 15.507 17535 20.090 21.955 


9 1.735 2.088 2.700 3.325 4.168 14684 16.919 19.023 21.666 23.589 
10 2.156 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209 25.188 
11 2.603 3.053 3.816 4.575 5.578 17275 19.675 21.920 24.725 26.757 
12 3.074 3.571 4.404 5.226 6.304 18.549 21.026 23.337 26.217 28.299 
13 3.565 4.107 5.009 5.892 7.042 19.812 22.362 24.736 27688 29.819 

14 4.075 4660 5.629 6.571 7.790 821.064 23.685 26.119 29.141 31.319 

15 4.601 5.229 6.262 7.261 8.547 22.307 24.996 27488 30.578 32.801 

16 5.142 5.812 6.908 7962 9.312 23.542 26.296 28.845 32.000 34.267 
17 5.697 6.408 7564 18.672 10.085 24.769 27587 30.191 33.409 35.718 
18 6.265 7.015 8.231 9.390 10.865 25.989 28.869 31.526 34.805 37156 

19 6.844 7633 8.907 10.117 11.651 27204 30.144 32.852 36.191 38.582 
20 7434 8.260 9.591 10.851 12.443 28.412 31.410 34.170 37566 39.997 
21 8.034 8.897 10.283 11.591 13.240 29.615 32.671 35.479 38.932 41.401 

22 8.643 9.542 10.982 12.338 14.042 30.813 33.924 36.781 40.289 42.796 
23 9.260 10.196 11.689 13.091 14.848 32.007 35.172 38.076 41.638 44.181 

24 9.886 10.856 12.401 13.848 15.659 33.196 36.415 39.364 42.980 45.559 
25 10.520 11.524 13.120 14.611 16.473 34.382 37652 40.646 44.314 46.928 
26 11.160 12.198 13.844 15.379 17292 35.563 38.885 41.923 45.642 48.290 
27 11.808 12.879 14.573 16.151 18.114 36.741 40.113 43.194 46.963 49.645 
28 12.461 13.565 15.308 16.928 18.939 37916 41.337 44.461 48.278 50.993 
29 13.121 14.257 16.047 17708 19.768 39.087 42.557 45.722 49.588 52.336 
30 13.787 14.954 16.791 18.493 20.599 40.256 43.773 46.979 50.892 53.672 
40 20.707 22.164 24.433 26.509 29.051 51.805 55.758 59.342 63.691 66.766 
50 27.991 29.707 32.357 34.764 37689 63.167 67505 71.420 76.154 79.490 
60 35.534 37.485 40.482 43.188 46.459 74.397 79.082 83.298 88.379 91.952 
70 43.275 45.442 48.758 51.739 55.329 85.527 90.531 95.023 100.425 104.215 
80 51.172 53.540 57153 60.391 64.278 96.578 101.879 106.629 112.329 116.321 
90 59.196 61.754 65.647 69.126 73.291 107565 113.145 118.136 124.116 128.299 
100 67328 70.065 74.222 77929 82.358 118.498 124.342 129.561 135.807 140.169 


D. B. Owen, HANDBOOK OF STATISTICAL TABLES, A.5, Published by 
Addison Wesley Longman, Inc. 


APPENDIX B_ Table 7—F-Distribution 


A20 


00'L Sel tStey iL LOW 6LL 06'L 00°¢ 6c See CSIC (ASA vLe 06°¢ 60°E Sas CES 8cV oes 88Z 90 


ev'l LO'L GL'L L8°L 86'L 60°C 6L'e LEC vac LL'?c L8'c 6S 60°E 8c€ GG’é COE OS'7 vas 8L'8 OZL 
69°L E8'L 96'L 80°C 6L'e AA 6E°C LGC ble 06°C LOE Shae 6C'E 6V'e OEE vLv ELV 6L'S 6v'8 09 
€6'L 90°C Ble 0E"c ove 0g'c 09°¢ 8Le G6"? Cle coe GEE Loe LLE 66'E LEV 867 L0°9 €8'8 ov 
Ble 0g? cv'c (ASA €9°S ELS c8°C Loe Ble vere Sve 8G'€ vLe GEE ECV cov ves Ge'9 8L'6 oe 
ve? road svc 95°C 99°C 9L°7 98°C vo’ LOE 8E'E 8v'e LOE LLEe 86'€ 9¢'0 99'°7 82'S ov'9 €¢'6 6¢ 
62°C LEC 8r'°¢ 65°C 69°C 6L7 68° LO'e GCE LVe cae S9IE Lge cov 0€'V OLY ces vv'9 826 8¢ 
SAA Lvc cae €9°C EL? €8°C €6°C LLE 8'€ Sve 9G°e 69°E G8 90° vev vLY 9e°S 6v'9 vee Le 
fSiG Svc 99°¢ L9G LEC L387 L6°C Sle BE 6V'E 09°€ C/E 68°E OL'v 8E'7 6L'V Lvs va'9 Lv’6 9¢ 
8E°C osc L9°¢ Cle c8e CBS LOE 0ce LEE poe yore 8Le vee GL'V ev'v vs’ 90'S 09°9 8v'6 Gc 
eve Soc 99°C LLG L8°C LOC 90°E Gee cve 6S°E 69°E e8ic: 66°E 0c 7 6v'v 687 cG'G S919. GS’6 ve 
8r'~ 09°¢ LL? C8'C COC coe Cle ogee LVe vg'e GLe 88'€ SO’ 92°17 vS'v G6 89'S E29 €9°6 EC 
SSI 99 (AEG 88°C 86°C 80°€ Sle IES VE OLE Lge VEE LLY cov Lov co's Sos L8°9 €L'6 KE 
19°? ELT vse G6°¢c SOE GLe veE eve 09°€ LLe 88'€ Lov 8lv 6E'V 89°7 60°S ELS 68°9 €8°6 L? 
69°C L8'? CBS coe cle coe cere ose 89'€ G8°E 96°E 60'V 9¢'0 Lv'v 9L'v LVS cs 669 v6'6 0c 
8L°¢ 68°¢ 00°€ LLE Lee LEE ove 6G'E 9L'€ E6'E vo'v 8Lv veV 9S°V G87 LOS co's 602 ZOOL 6L 
L8° 66° Ole 0c'€ Oe'e Ove Os’é 89°€ 98'E €0'V vLY 8cV vv 99°V 967 LES €0°9 Lez cc Ol 8L 
86°C Ole LOE Lee LVe Lge LOE 6Le LOE vLYV GCV 6E'V 9S'V 8L7 L0°S 0g°S 9L'9 GEL 8€OL ZL 
Le coe Bits ve ve oie SES aGie OL'V Lov 8e7 ca'v 69°7 Lev Leis vag 0e9 LZ 83°0L 9L 
9c'E LEE sve 8o'€ 69°E 6L'E 88°€ LO'V GC'v cv'v voy Lov G8 L0°S LEG 08°S 8v'9 OLL 08°0L GL 
ve SENS O9ie. CLS 98°€ 96'E 90° Gov ev'v 09'V cLly 98°07 €0°S OGS PEPE 00°9 89°9 c6L 90°LL vL 
G9°E 9L'E LB'E LOE LO'V LUV Lev 907 v9a'v c8'V ve'v 80'S GCS 8's 6L'S €¢'9 €6°9 6L'8 LE'LL EL 
06'E LO'V Cl ec'V ecev ev'v eS? clLy Lev 60°S 0c'S Ses (ANS 9L'S L0°9 (ANS) (4/4 LG's SLLL cL 
ECV vEV vv GS’? GOV 9L'V 98'7 GO0'S ves cvs vas 89°S 98°S OL’9 cv'9 88°9 09Z L6'8 EL‘ CL LL 
vo'v GLY 987 L460 L0°S LVS LOG Lvs 95S G8°S LOS cLlg 0e"9 vg9 £89 vel 80°8 ev'6 E8°CL OL 
6L'S oes Lvs cg'G cog EL €8°S €0°9 €c'9 cv'9 vag 69°9 88°9 ELL LVL 962 cls LL’OL LO’EL 6 


G6°S 9059. 8L°9 6¢°9 ov'9 0s°9 L9°9 L8°9 LOZ Lez vel 0SZ 692 G6Z 0e's L8°8 09°6 vOLL  69°VL 8 
802 6LZ LEZ Cv EGZ GOL GLL L6L 818 8&8 Lo’8 89°8 68°8 9L'6 co'6 GOOL 880 O”CcL ve9l L 
88'8 00°6 cL6 ve'6 9€'6 Lv'6 6S°6 L836 cOOL ScOl 6&0 ZGOL 620 ZOLL 9VIL E0CL cOCL SL Ee9°8L So 
vlclL “Loch OVCL EGCL 99CL BLCL OBCL GLEL B8EEL COEL LLEL YGEL OCvVL ILGbVL vEVL OGSL EG9L LESL sL'ee S 
CoG VAG een iG uO O SIO mmc O10 Cammy li O Camm VAO Com OO Gu Oi OC Como al Camm ONL Commer ONL. Commun OAC Comune Lac CuO CAU.CummO CLO GRC CALS v 
eslyp 66lV Gltév Lecv Lvtv cOCH BLCV B80EV GEED GIEV B88EV ELVY EVD V8 GESH G6L9V LvVlD O86Y GS°SS € 
G66L G66L SG 66L G66L G66L G66L VveE6L VEEL VEEGL VEEL VEEL VEEL VEEGL ECEL ECEL CEE6L CEEGL O6EL SG 8EL A 


G9VSGc 6SEGC EG7GC BrlSc Vvrose OV67C YE8VC OEIC YICrPC PecreC IL6072 GC6EC GLILEC LEvEC Y9GOEC O0SGZ2 GI9lZ 00002 IILZC9L L 
00 OcL 09 ov 0€ ve 02 SL cL OL 6 8 L 9 GS v € 4 L JoyeulwWouUsp 
JO}e19WINU ‘WOpaads Jo saaibaq :N-y uopeal 
prem? qe¥P jo saoibeq 
g00'0 = % Ap 
: UOHNQHISIq-J—Z SIQDL 


A21 


APPENDIX B_ Table 7—F-Distribution 


00°L cel Lvl 6S'L OLL 6L'L 88'L vo? Ble cee Lv’? Laé vac 08°¢ coe cee Ble Lov Bee) oe 
8E'L EG'L 99'L 9L'L 98'L G6'L €0°C 6L'e vec LvV'C 99°C 99°C 6LC 96°C LVE sve GEE 6LY G8°9 O¢L 
09°L ELL vel vel €0°C Cle 0c°¢ GE? 0g'¢ €9°C CL? c8C G6? cre vee Sights: ELV 867 802 09 
08'L c6'L c0'S LL’? 0c°¢ 60°C LES cae 99°C 08°¢ 68° 66° Cle 6c'E Loe Ese Lev 8L'S LEL ov 
LO? LL’? Lo? oe? 6E°C Lve GG? OL? vec 86°C LOE LVE Ogee LVEe OLe cov Lov 6E°S 9S oe 
€0°¢ vlc ECC ee"? Lvc 6r'¢ Lg? ELC L8°7 00'e 60°€ Oce eee 0s’e ELE vo'v vo cvs 09Z 62 
90°¢ Lie XG ts G vve GONG 09°¢ SES 06°C E0'E Cle ECE wets cane SEE LO'V ES, Svs VOL 82 
Ol’? 02°C 60°C sec Lv'7 Goc €9°C 8L°C 6S 90°E GLE 9'€ 6EE 9g’e 8Le LL'V 09°V 67'S 8972 Le 
EL? AKA EEC cvc og'¢ Bac 997¢ L8°? 96°C 60°E Sle 6ce cve 6G°e coe VL? v9o'v EGG CLL 9¢ 
LV? LOC 9e°C SVC vae coe OL? G8" 66°C ele coe cee 9V'e e9'E G8’e 8L'v 89°17 L£g°S LLL Ge 
Lo? LE? ove 6y'°C Bac 99°C vlc 68°C E0'E ARS 92'E O8ie ose ERE 06'e cov clLy LoS c8Z ve 
90°C GE"? Svc voc coe OL? 8L°7 e6C LOE Lee ogee Lv’e ve LLE vee 92°17 9L'v 99°S 882 €¢ 
LE? ove 0s°¢ 8a°¢ L9°C GL? €8° 86°C cre 9¢'E Gee Sve 6S°E 9LE 66'°E LEV c8'V cls G6Z cc 
gE"? 97°~ Go°¢ v9? cle 08°¢ 88°¢ E0°E LVE Lee Ove LGe VIE L8e vo'V LEV L8'0 8ZL'S c0'8 Le 
cv'e (ASA L9°¢ 69°C 8Lé 98°C ve? 60'°E Ec'e Lee 9V'E 9G'E OLe L8'€ OL'v ev'v ve'v G8°S OL's 0c 
6'C 85°C L9°¢ 9L'S vse c6'S 00°€ GLE Oe"e eve cE €9°E LLEe VEE LLY 0S'7 LOS €6'S 8L°8 6L 
[LEG 993i CVLG cA c6C OO'e 80°€ eae [ES 3; LS’e Ot WAS vse LOY Gcv 8o'V 60°S LO'9 628 8L 
G9"? GLC €8°C CBC 00°E 80° OLE Lee 9v'e 6S°E 89°€ 6L'E e6E OLlv vev L9°0 8L'S LL’9 ors ZL 
GLC v3'c €6°C coe OLE Sle IGE LV’e GIES GENS 8L€ 68°E €0'V 0c'V vv (HEN? 60'S €o9 €G'8 9L 
L8°~ 96°C Go'e ele Lee 6C'E Lee cae LOE ose 68'E 00° vL'v cev 99°77 68° cvs 9€'9 89°8 GL 
00°€ 60°E 8Le LO€ See. eve LG€ O9iS 08'€ vee €0'7 vLY 8cV 9v'v 69° vos SiS Lg°9 98'8 vL 
LVE GCE vee eve LG’é 6G°E 99°€ cee 96°E OL'v 6L'V O€'v vr'v cov 98°07 Los DL'S 0L9 L406 EL 
98'E Sve vse core OLE 8Le 98°€ LO'V OLY O€'v 6E'V 0S’ vor csv 90°S Lvs G6'S €6°9 €€'6 cL 
09°€ 69'E Ble 98°€ vee c0'V Ol'v GeV ov'v vo €9'7 vLY 687 L0°S ces L£9°S cco" Lez g9°6 LL 
LEE 00°” 80°07 LUV GeV Soil, Lv'v 99°77 LLY G8'v ve'v 90°S 00'S 6E°S vos 66'S SSig SSE vo'oL OL 


LEV ov'v 8v'v Lov G9 ELV L8'V 96° LL*S 92° Ges LVS L9°S 08'S 90°9 cv'9 66°9 c0'8 99°0L 6 
98° GEV coi ols 0¢°S 8c'S Seis cas HENS) L8°S L6'S €0°9 8L9 LE) egg LOZ 6SZ G9°8 9c LL 8 
g9°S vLs c8'S L6'S 66'S L0°9 9L'9 Le"9 Lv'9 c9'9 cl9 veg 66°9 6LZ 9vL G8Z Svs GS’6 GoCL L 
88°9 469 9072 VLZ ECL LEZ OvZ Ogi2 CLL L8Z 862 OL's 9¢'8 Lv'8 GL8 GL'6 8L6 ceOL = GLEL 9 
c0'6 LL*6 026 626 8e°6 Lv'6 GS’6 cLl6 68°6 GO;OL 9LO0L 620 970L LZ90L L6OL GELL Q90CL LOEL 9e9L S 
DAC nO SES POS S/he ee 8). Dene Ole Lamnc Ol, Le OCA ee eay, SSvlL 997L O8vVL S867L IeGSl cSSL 86S) 6991 O08 O02c1¢ v 
EL9e ~¢cc9e cEege LvV9C OGS9C O99 699% L89C GOL E¢le GELC G6rLe LIOLZC l6Z¢ 86h 8 SCC LLB VEC OCCBOE OC CL VE € 
os66 6766 8766 LV66 LY6E 9766 SV6E eV66 Cr6E6 OVEG GE6E LEGGE YECE ECF66 OF6E G266 LL66 0066 05°86 ¢ 


99€9 6EE9 ELEO L829 L9c9 GEt9___—~60ZO LGLO 90L9 9909 cc09_ _— 286 8726S 6598S _v9ZS Gc9G_ ~—c0vG ~=G666r cS0r L 
00 OcL 09 ov o€ ve 02 GL eL OL 6 8 L 9 G v € é L JoyeulwOoUusp 
P Na “wopaady 
J0}e49UUINU ‘WOpaa. Jo saaibaqg ‘yp jo soaibeg 
LOO = 2 :0--p 


(penuyUod) UOKNIIsiIqg-J—Z S10] 


APPENDIX B_ Table 7—F-Distribution 


A22 


00'L Le 6E'L 8r'L LoL vol LLL €8'L vel GO"? LL? 6L'e 60°C Lvc LSS 6L°7 Clg G9ie co's oo 


LE'L ev'l eG'L LO'L 69'L 9L'L esl vel GO"? OL’? CoS oe? 6E°C cae L9°C 68°C ECE 0s’e GL’S O¢L 
8v'L 8S'L LOL vLL cs'L 88'L vel 90°C LV? LOC Bis G Lv’? LG’? €9°C 6L¢ LOE vere eee 60'S 09 
vol cL L 08°L 88'L vel LO"? LOC 8c 60°C 6E°C Svc EGC coe vlc 06°C ele 9V'E SOV cvs ov 
6LL L8'L vel od LOC vlc 0c? LE? Lv'¢ La'é LC G9°¢ GL? L8°C £0'E GCE 6G°E 8L'v HENS oe 
L8'L 68'L 96'L €0°C 60°C GL? Lo? CES eve EGS 65°C L9°C 9L'% 88° vo'e LOE LOE 0¢'V 69'S 6c 
Es'L L6L 86'L G0’? LL? LUIS eG VANE Svc SGC LO? 69°C 8Le 06°¢ 90°E 6c E E9'E ccv Lois 8¢ 
g8'L E6'L 00°¢ LOC EL? 6c GC? 9e°% Lv'~ Lg? €9°~ LL? 08°¢ COC 80°E Lee g9°E vcvV €9°S Le 
83'L G6'L €0°C 60°C OL? coe SCC 6E°C 6y'C 69°C Go? SLC c8C vec OLE SoS LOE LOV 95S 9¢ 
L6'L 86'L gO"? clé Ble ve? 0g? Lv? Lg‘? LO? 89°C GLC G8" LEC Ele Gee 69°€ 6c'V 69°S Gc 
vel LO“? 80°C GL?e Lo? LOC toa 4 ve vae vac OL'c 8Le L8°¢ 66° GLE see cle cev CES ve 
L6'L vo"? LL’? 8c vec 0e"¢ 9e°C LV'C La°~ L9°S ELC LB? 06°C cove Sle LVe GLE Gey GL x4 
00°¢ 80°C vlc Lo? LOC toa 4 6E°C ogc 09°¢ OL? 9L¢ vs'c €6°C GSO'E coe ve 8Le 87 6LS cc 
vo"? LL*? 8L°c Ge? LE? LEC cv'e EGS vac ELT 08°2 L8°% LEC 60°€ GCE sve cvé cv'v €8°S Lé 
60°C 9L'e coe 60°C Ge"? Lv'c 97°C La°¢ 89°C LLG vec L6‘?e LOE Ele 6C'E Loe 98'E 907 L8°S 0c 
EL? 0c? LOC ee? 6E°C Sve La? cae cL? coe 88°C 96°C SOE LVE eee 9G°e OG'E Lov ces 6L 
6L'e 90°C cee 8E°C ve 0s'¢ 9S°¢ L9°C LiLG L8°¢ €6° Loe Ole coe 8E'E LOE GEE 9S°7 86'S 8L 
Gc? cee Bec vv ogc 99°C c9'S cle c8C c6'S 86°C 90°€ OLE 8c'€ vrve 99°e LO'V co'V vo'9 “ZL 
CEG 8E°C Svc La? LGC SEG 89°C 6LC 68°C 66° SOE cle COE vee ose Lt 80°V 69° Aas) OL 
ove 9v°C Aca 69°C gre OL? gle 98°C 86°C 90°E cle 0c'e 6C'E LVe Boe o8'€ GL'V LLY 0¢°9 GL 
67°C GS°c LO"? LSC ELS 6L°¢ vec G6" GO'E GLE LOE 6c'E 8ee OSiS O91S 68°E vev 98° 0€'9 vL 
09°¢ 99°C cle 8L°% vse 68° G6"? SOE GLE GCE Lee 6E'E sve 09'e LL'e 00° GEV L6'V Lv’9 EL 
CL? 6L2 G8" Lec 96° coe LOE Sle 8c€ Lee ve Loe LOE ele 68'E clv Lv'v OLS Scio cL 
88° vec 00°€ 90°E Cle LVe EwE ee eve EGE 6S°E 99°E 9L'€ 88°€ vo'vV 8cV E97 92'S cLl9 LL 
80°€ DLE Oats 92E LEE LEE Gvie Ants coe CLE cyl ts G8e G6"E LO'V vev Lv'v €8'V 9v'S veg OL 
cee 6EE Sve Lae 9g°e LOE LOE LLe L8'€ 96°E £07 OL'v 02'V cov 8v'V cLv 80°S LZS LoL 6 


LQIE Lt SEE vse 68°E G6e 00'v OL'v 0c'V Ofer ey ev'v eGv GOV c8'V GOS cvs 90°9 (ASHL 8 
vLY 0c'V Gc'V LEV 9E'V cv'v Lv'v Lov L9'V 9L'V c8'V 06° 667 cL’ 62'S cas 68'S va'9 L0°8 L 
G87 067 96° LOS L0°S cls LVS LOS LS 90'S cg'G 09°S OLS 38'S 66'S €c'9 09°9 9¢L 188 s) 
c0'9 40°9 cL'9 8L'9 €¢'9 829 €e'9 €v'9 cg'9 c9'9 89°9 9L'9 G89 86°9 GLZ 6EZ 9LL €v'8 LO°OL G 
948) LES 9€'8 Lvs 978 LG's OSi8) 99°8 GL8 vs's 06'8 868 L0°6 026 9€°6 09°6 866 G9OL ccc v 
O6°EL G6"EL 66°EL vo'vl 80°VL clLyL “lvl G@vlL vevl cyvvl Lvvl vGvlL cOVL ELVL B887L OLSGL HSL VO9L VVZL € 
os6e 676E 8V6E LV6E S9VGE 9VGE GYGE EVGE IVGE OVGE GEGE EGE YEGE EEGE OFGE SGc6E LLGE O0GE IG 8E A 


SLOL vLOL OLOL 9001 LOOL ZZ66 L'S66 686 L9L6 9'896 S696 L956 7286 LZE6 BLZ6 9668 ZP9I8 G66L 8ZH9 L 
© OzL 09 Ov oe vz 0z GL ZL OL 6 8 L 9 g v € z L _|40,eurwouap 
; Na “wopaady 
Jo} eJauINU ‘WOpaa.y Jo saaibag :Nyp jo seaiBaq 

&z0'0 = 2 :dyp 


(PEnuYUOD) UOKNQHISIG-J—Z SIQD]L 


A23 


APPENDIX B_ Table 7—F-Distribution 


00°L 
G?e'L 
6E'L 
LG'L 
col 
vg9'l 
g9'L 
L9'L 
69°L 
LLL 
ELL 
9ZL'L 
BLL 
L8'L 
vs'l 
88'L 
cel 
96'L 
LO“? 
L0'% 
EL‘? 
Le? 
OE"? 
ove 
va~e 
LL? 
€6°C 
ec'e 
LOE 
9E'V 
£9'S 
€G'8 
0S'6L 
Pe ATA 


oe} 


col 
GEL 
Lvl 
8g'L 
89'L 
OLL 
LLL 
ELL 
SLL 
LLL 
6LL 
L8'L 
sl 
L£8'L 
06'L 
E6'L 
L6'L 
LO? 
90°C 
LL’? 
BL? 
Ge? 
vec 
Svc 
Bac 
GLC 
LEC 
LOE 
OLe 
ov'v 
9955 
Go's 
6V'6L 
EEG? 
OcL 


cel 
ev'L 
EG'L 
vol 
vLL 
SLL 
LLL 
6LL 
08'L 
c8'L 
vst 
98'L 
68'L 
cel 
Sé'L 
86'L 
c0'¢ 
90°C 
LL“? 
OL’? 
CoS 
0g? 
8E°C 
6y'°C 
c9'¢ 
6LC 
LO’e 
ogee 
vLe 
ev'v 
GOS 
L9°8 
8v'6L 
(ATS 
09 


6E'L 
0S'L 
6S'L 
69'L 
6LL 
L8‘L 
cst 
v3'L 
G8'L 
L8'L 
68'L 
L6O'L 
vel 
96'L 
66'L 
€0°C 
90°% 
OL’? 
Glé 
0¢°¢ 
LOC 
ve? 
eve 
EGC 
99°¢ 
€8°C 
yore 
vee 
LLE 
9v'7 
cL 
69'8 
LV'6L 
LLG? 
ov 


9v'L 
GG'L 
G9'L 
vLL 
vel 
S8'L 
Z8°L 
88'L 
06'L 
c6'L 
vel 
96'L 
86'L 
LO"? 
vo"? 
LO% 
LL? 
GL? 
6L°¢ 
GC"? 
Le? 
8e°C 
Lve 
La°C 
OL? 
98°C 
80°E 
BEE 
Lge 
Os'v 
Cyan) 
c9'8 
9v'6L 
L°0S¢ 
o€ 


cs'b 
LO'L 
OLL 
6LL 
68'L 
06'L 
LOL 
€6'L 
G6'L 
96'L 
86'L 
LO? 
€0°C 
GO"c 
80°C 
LL? 
GILG 
6L'~ 
vec 
62°C 
GE"? 
cv'e 
LG’? 
L9°¢ 
vL?é 
06°¢ 
Cle 
Le 
vse 
eGov 
EES 
vgs 
Sv'6L 
L'6ve 
ve 


LoL 
99'L 
SLL 
vst 
eel 
vel 
96'L 
L6'L 
66'L 
ed 
€0°C 
G0’? 
LO 
OL’? 
Cig 
OL’? 
6L'~ 
ECC 
8c" 
eee 
6E°C 
9v'~ 
voc 
g9°C 
LEC 
vec 
SLe 
ve 
L8'€ 
95°07 
08'S 
99°8 
SvV'6L 
0'8r¢ 
02 


LOL 
SLL 
vs't 
c6'L 
LO"? 
€0°C 
vo? 
90°C 
LO? 
60°C 
LL“? 
EL? 
GLé 
BL 
0c'¢ 
EcS 
LOC 
Le“? 
GE"? 
ove 
97°C 
EG? 
c9'S 
Cle 
G8" 
Lo“e 
Cce 
Lae 
vee 
cov 
98°S 
0L'8 
ev'6L 
6'SvC 
GL 


SLL 
E8'L 
cel 
00°¢ 
60°C 
OL’? 
clé 
ELS 
GL? 
OL’? 
Ble 
0c? 
ATA 
GcC 
8oC 
LE? 
vec 
8E°C 
cv'c 
8v'~ 
EGS 
09°¢ 
69°C 
6L°C 
Le"? 
LOE 
8ce 
LGE 
00° 
897 
L6'S 
vL8 
LV'6L 
GEC 
cL 


€8'L 
L6'L 
66'L 
80°C 
o) A 
Ble 
6L'~ 
02°C 
coc 
ve? 
Go? 
LOC 
0€"c 
cE 
SIC 
8e°C 
Lvc 
Svc 
6v'C 
vac 
09°¢ 
L9°¢ 
GLC 
G8" 
86°C 
VLE 
SES 
vg'e 
90° 
vLY 
96'S 
6L'8 
Ov'6L 
6 LC 
OL 


88°L 
96'L 
vo"? 
cl’? 
Lo? 
CCC 
vec 
STARA 
LOC 
8c? 
0g? 
cee 
vec 
Les 
6E°C 
cv'e 
9v'¢ 
6V'C 
vae 
69°C 
G9°C 
LL? 
08°¢ 
06°C 
coe 
Sle 
6E'E 
89°€ 
OL'v 
LLY 
00°9 
188 
8E'6L 
G'0ve 
6 


vel 
c0'? 
OL’? 
Ble 
LOC 
82°C 
627 
Le? 
cee 
vec 
98°C 
LEC 
ove 
cv'c 
Svc 
8c 
LG‘? 
Gove 
69°C 
v9? 
OL? 
LLé 
G8" 
G6" 
LOE 
Ewe 
vve 
ele 
GLV 
c8'V 
vo'9 
G8°8 
LEe6L 
6'8EC 
8 


4oyeJ9 INU ‘WOpaady Jo saaiBaq :Ny-p 


s00 = ¥ 


LO’? 
60°C 
Lie 
Ge? 
e€°C 
Gee 
98°C 
LEC 
6E°C 
ove 
cve 
vre 
90°C 
6V'C 
LSig 
va7é 
SSiG 
L9°¢ 
99°¢ 
LL? 
LG 
E8' 
Le"? 
LOE 
vLe 
6cE 
ose 
6Le 
LOvV 
88'0 
60°9 
688 
Geel 

8'9EC 
L 


OL’? 
LL? 
Go? 
vec 
cv'c 
ev'~ 
Svc 
97°C 
LvV'C 
60°C 
LG"? 
EGC 
SSid 
La% 
09°¢ 
€9°C 
99°C 
Ole 
vlc 
61°C 
G8 
CBS 
00°€ 
60°E 
CoE 
LEE 
Boe 
L8€ 
8e'V 
Sev 
919 
v6'8 
Ee 6L 
O'vEC 
9 


Lo? 
62°C 
Lec 
Svc 
EGS 
Go°c 
BET 
L9°S 
69°C 
09°¢ 
69'S 
vac 
99°C 
89°C 
LL? 
vL?e 
LEG 
L8°¢ 
G8" 
06°C 
96°C 
EOE 
LLe 
0c'e 
ete 
sve 
69°E 
LOE 
6E'V 
gOS 
929 
L0°6 

O€6L 

c'0EC 

G 


Lec 
Svc 
EGS 
L9°¢ 
69°C 
OL? 
LL?¢ 
EL? 
vL?e 
9L'% 
8Le 
082 
c8'C 
vse 
L8°C 
06°2 
£67¢ 
96°C 
LOE 
90°E 
LL'e 
Sle 
9¢'E 
9e'€ 
8v'e 
Ege 
vse 
cly 
eo'v 
6L'S 
6E°9 
cL6 
Gc'6L 
9'vcc 
v 


09°¢ 
89°C 
9L°¢ 
vse 
coc 
€6C 
G6? 
96°C 
86°C 
66° 
LO’e 
e0'€ 
GO’ 
LOE 
Ole 
ele 
ole 
Océ 
vee 
6c'E 
vee 
LVe 
6V'E 
6S°E 
LLZE 
98°€ 
LO'V 
GEV 
9/8, 
Lvs 
6c°9 
826 
QL6L 
LGLC 
€ 


00°€ 
LOE 
SLE 
ewe 
cee 
eee 
vee 
Gee 
LEE 
6E'E 
Ove 
cve 
bre 
LVe 
6V'E 
CGE 
GENS 
6S°E 
Soe 
89°€ 
vLe 
Lge 
68'E 
86°E 
Ol'v 
900 
97 
vL'v 
LS 
62'S 
v6'9 
GSo’6 
00°6L 
S661 
4 


vee o 
cEE OZL 
00°” 09 
80° Ov 
LLY 0€ 
8L'v 62 
0cV 8¢ 
Lov LZ 
SGiv, 92 
vev Ge 
92'0 ve 
82°07 €% 
0e'v KG 
cev L¢ 
GeV 0c 
8e°V 6L 

Lv'v 8L 
Gv'v ZL 
6y'7 OL 
vot GL 
097 vL 
Lov EL 
GLY OL 
vs'v LL 
967 OL 
eL'S 6 
cesg 8 
6a'S Z 
66'S 9 

L9°9 g 

LLL v 
EL’OL € 

LGSL A 

VOL L 

L 4o}eulWOUEp 
‘wopaaly 
jo sooibaq 
yp 


(PEnuUOD) UOHNGHISIG-J—Z SIQD]L 


APPENDIX B_ Table 7—F-Distribution 


A24 


O€'L 
LE'L 
vr'l 
LG'L 
LoL 
8o'L 
6S°L 
09°L 
LOL 
€9'L 
vol 
99'L 
LOL 
69'L 
LZL 
ELL 
SLL 
BLL 
L8'L 
G8'L 
68'L 
e6'L 
66'L 
GO°C 
EL? 
LARA 
98°C 
vac 
8L°C 
9L'e 
Ose 
OLS 
Lv'6 
€S°c9 


vEL 
LVL 
8v'L 
vot 
LOL 
col 
egl 
gol 
go'L 
99'L 
LOL 
69'L 
OLL 
cL 
DLL 
9L'L 
BLL 
L8'L 
vel 
L8'L 
L6L 
96°L 
LO"? 
80°C 
9? 
GoC 
8c 
99°C 
08°¢ 
LVe 
cee 
LLG 
9v'6 
9c°C9 


8e'L 
Sv'L 
LL 
Lg'L 
val 
g9'L 
99°L 
L9'L 
89'L 
69'L 
OL'L 
cL'L 
ELL 
SLL 
LEV 
6LL 
L8'L 
v3'L 
Z8°L 
06'L 
vel 
86'L 
vo? 
OL’? 
Ble 
82°C 
ove 
89°C 
c8e 
6LE 
Es'é 
BLS 
Sv'6 
00°¢9 


cv 
8r'L 
vot 
LO'L 
LOL 
89'L 
69°L 
OLL 
LEL 
cL 
ELL 
DLL 
9LL 
BLL 
6LL 
L8'L 
v3s'L 
98'L 
68'L 
c6'L 
96'L 
LO? 
90°C 
cl?é 
0¢'¢ 
0e"c 
cv'c 
69°C 
vec 
Lee 
vse 
8L°S 
vv'6 
vL19 
02 


6y'L 
GG'L 
09'L 
99°L 
cL 
ELL 
DLL 
GL'L 
9LL 
LLL 
BLL 
08'L 
L8'L 
€8'L 
vst 
98'L 
68'L 
L6'L 
vel 
L6'L 
od 
GO"? 
OL? 
LVé 
vec 
vec 
92 
€9°C 
L8°C 
vee 
LE 
02'S 
cv'6 
cc'1L9 
SL 


So'L 
09°L 
991 
LLL 
LLL 
BLL 
6LL 
08'L 
L8'L 
c8'L 
Es'L 
v3s'L 
98'L 
LZ8°L 
68'L 
L6'L 
€6'L 
96'L 
66'L 
c0'S 
GO"? 
Ol’? 
Gl’¢ 
Le? 
AA 
8c 
0s'¢ 
L9°% 
06°C 
LOE 
06'e 
ccG 
Lv’6 
LZ'09 
cL 


09'L 
G9'L 
LLL 
9L'L 
c8'L 
Esl 
vs'l 
G8'L 
98°L 
L8°L 
88'L 
68'L 
06'L 
cel 
v6'l 
96°L 
86'L 
00° 
€0°C 
90°C 
OL’? 
vlc 
6c 
GCC 
CEC 
cv'e 
va¢ 
OL? 
ve? 
O€'€ 
cE 
ECS 
6E'6 
6L'09 
OL 


€9'L 
89'L 
vL'L 
6L'L 
S8°L 
98°L 
L8'L 
L8'L 
83'L 
68'L 
L6EL 
c6'L 
eel 
G6'L 
96'L 
86'L 
00°¢ 
€0°C 
90°C 
60°C 
Cle 
OL? 
Lee 
LOC 
Ct 
ve 
99°2 
cL?é 
96°C 
cee 
vee 
ves 
8E'6 
98°6S 
6 


"ssorg AUSIOATUE) P10JxO “L8-pL “dd “(Ep6T) E€ BNWT ,“uorNquysIg (4) eIEg 
POWOAU]T OY} JO sJUIOg o9e]UdIIOg JO IQR], ‘UoOsdwoYL PO pure uoysuLLopy "PY WO1y 


L9'L 
cL 
LLL 
E8'L 
88'L 
68'L 
06'L 
L6'L 
cel 
€6'L 
vel 
G6'L 
L6L 
86'L 
00°¢ 
c0'S 
vo? 
90° 
60°C 
cle 
GLc 
0c? 
ve? 
oe? 
8E°C 
LvV'~ 
69°C 
GL? 
86°C 
vee 
G6"E 
Ges 
LE6 
vr'6s 
8 


4oyesa INU ‘WOpaaly jo saaibaq :Ny-p 


OoLo = 7 


cL 
LLL 
ceil 
L8'L 
E6'L 
E6'L 
vel 
G6'L 
96'L 
L6L 
86'L 
66'L 
LO"? 
c0'S 
vo"? 
90°¢ 
80°C 
OL’? 
ELS 
OL’? 
6L'e 
EC'S 
8cC 
vec 
Lvc 
Lg"? 
(AA 
8L°C 
LO’e 
LEE 
86'€ 
LOS 
GEG 
L6°8S 
L 


LLL 
c8'L 
L8°L 
e6'L 
86'L 
66'L 
00°¢ 
00°¢ 
LO? 
c0'S 
vo? 
G0’? 
90°¢ 
80° 
60°C 
LL? 
4 
GL? 
8L°c 
Le? 
ve? 
82°C 
€€°C 
6E'C 
9V'¢ 
GGc 
L9°C 
€8° 
SOE 
Ove 
LO'V 
82'S 
€€'6 
0¢°8S 
9 


G8'L 
06'L 
G6'L 
00°¢ 
GO"? 
90°C 
90°¢ 
LOC 
80° 
60°C 
OL’? 
LL? 
EL? 
vie 
912 
8L°c 
0c’? 
CoS 
vec 
LOC 
LE? 
Ge"? 
6E°C 
Sve 
CGC 
L9°¢ 
ELC 
88°C 
LL'€ 
Sve 
SOV 
LEG 
626 
vels 
S 


vel 
66'L 
vo? 
60°C 
vlc 
GL? 
9c 
LVe 
Lic 
Ble 
6L'c 
Le? 
coe 
EC'S 
Sic 
LOC 
6c°¢ 
Le? 
toa 4 
98°C 
6E°C 
eve 
8c 
vac 
Loc 
69°C 
L8°¢ 
96°C 
8Le 
cGe 
LL'V 
ves 
ve6 
€8°SS 
v 


80°C 
EL‘? 
8L°c 
EcC 
82'S 
82°C 
62°C 
oe? 
LE? 
CES 
toa 4 
vec 
SEI 
9E'C 
8e°C 
ove 
cvc 
ve 
9v'¢ 
6v°~ 
(AWA 
9G°¢ 
L9°¢ 
99°C 
ELC 
L8°¢ 
CBC 
LOE 
6C'E 
co'E 
6L'V 
6E°S 
9L'6 
6S°ES 
€ 


0e"¢ 
GE"? 
6e°C 
bre 
6'C 
osc 
0g°¢ 
Lg‘? 
(AKG 
EGC 
vg? 
Gc’ 
99°C 
La°7 
65°C 
LO? 
69S 
vac 
L9°C 
Ole 
ELC 
gL? 
L8? 
98° 
(A0e4 
LOE 
LLE 
9c'E 
9V'E 
BLE 
cev 
90'S 
00°6 
0S'67 
4 


LL? oo 
GL? 021 
6L'7 09 
vse ov 
88° 0€ 
68° 62 
68° 82 
06° le 
Le? 92 
COS GZ 
€6'C ve 
ve? EC 
G6'C (aA 
96° Lé 
46° 0c 
66°C 6L 
LOE 8L 
E0'E ZL 
GO'E 9L 
LO'E SL 
OLE vl 
vL'e EL 
BL’ cl 
ECE LL 
6C'E OL 
9E'E 6 
9V'e 8 
6G°E L 
Ble © 
90° G 
vS'v v 
vas € 
€9°8 ¢ 
98'6E L 
L 4o}eulwousp 
“WOopedly 
jo saaibeq 
yp 


(pEnuyUod) UOKNQHISIG-J—Z SIQD]L 


APPENDIX B Tables 8 and 9—Critical Values for the Sign Test and the Wilcoxon Signed-Rank Test A25 


Table 8—Critical Values for the Sign Test 


Reject the null hypothesis when the test statistic x is less than or equal to the value in the table. 


One-tailed, 
a = 0.005 a = 0.01 a = 0.025 Qa 
Two-tailed, 

n a = 0.01 a = 0.02 a = 0.05 a = 0.10 


0.05 


Note: Table 8 is for one-tailed or two-tailed tests. 
The sample size n represents the total number 
of + and — signs. The test value is the smaller 
number of + or — signs. 


From Journal of American Statistical Association Vol. 41 
(1946), pp. 557— 566. W. J. Dixon and A. M. Mood. 


DAMNOnan»bhRRRWWWNHDNH HH HO 
NNNDODODOTOOAHHBHRWWWND AH 


= 

N 
aonb RRWWWNHNYDNMH KH KHOWeAe 
Oann#»fRPRRWWNHNDNMH HH OCWe 


Table 9—Critical Values for the Wilcoxon Signed-Rank Test 


Reject the null hypothesis when the test statistic w, is less than or equal to the value in the table. 


One-tailed, 
a = 0.05 a = 0.025 a = 0.01 a = 0.005 

Two-tailed, 
n a = 0.10 a = 0.05 a = 0.02 a = 0.01 
5 1 _ _ _ 
6 2 1 _ — 
7 4 2 0 _ 
8 6 4 2 0 
9 8 6 3 2 
10 11 8 5 3 
11 14 11 7 5 
12 Al7/ 14 10 7 
13 21 17 13 10 
14 26 21 16 18} 
15 30 25 20 16 
16 36 30 24 19 
17 41 35 28 23 
18 47 40 33 28 
19 54 46 38 32 
20 60 52 43 37 
21 68 59 49 43 
22. 75 66 56 49 
23 83 73 62 55 
24 92 81 69 61 
25 101 90 77 68 
a - - _ a - From Some Rapid Approximate Statistical Procedures. 
28 130 17 102 92 Copyright 1949, 1964 Lederle Laboratories, American 
29 141 127 Tel 100 Cyanamid Co., Wayne, N.J. 
30 152 137 120 109 


A26 APPENDIX B Tables 10 and 11—Critical Values for the Spearman Rank and Pearson Correlation Coefficients 


Table 10—Critical Values for Table 11—Critical Values for the 
the Soearman Rank Pearson Correlation 
Correlation Coefficient Coefficient 

Reject Hp: p, = 0 when the absolute value of r, is greater The correlation is significant when the absolute value of r 

than the value in the table. is greater than the value in the table. 

n a = 0.10 a = 0.05 a = 0.01 n a = 0.05 a = 0.01 
5 0.900 — — 4 0.950 0.990 
6 0.829 0.886 _ 5 0.878 0.959 
7 0.714 0.786 0.929 6 0.811 0.917 
8 0.643 0.738 0.881 7 0.754 0.875 
9 0.600 0.700 0.833 8 0.707 0.834 
10 0.564 0.648 0.794 9 0.666 0.798 
11 0.536 0.618 0.818 10 0.632 0.765 
12 0.497 0.591 0.780 11 0.602 0.735 
13 0.475 0.566 0.745 12 0.576 0.708 
14 0.457 0.545 0.716 13 0.553 0.684 
15 0.441 0.525 0.689 14 0.532 0.661 
16 0.425 0.507 0.666 15 0.514 0.641 
17 0.412 0.490 0.645 16 0.497 0.623 
18 0.399 0.476 0.625 7 0.482 0.606 
19 0.388 0.462 0.608 18 0.468 0.590 
20 0.377 0.450 0.591 19 0.456 0.575 
21 0.368 0.438 0.576 20 0.444 0.561 
22 0.359 0.428 0.562 Pll 0.433 0.549 
23 0.351 0.418 0.549 22 0.423 0.537 
24 0.343 0.409 0.537 23 0.413 0.526 
25 0.336 0.400 0.526 24 0.404 0.515 
26 0.329 0.392 0.515 25 0.396 0.505 
27 0.323 0.385 0.505 26 0.388 0.496 
28 ORBIT, 0.377 0.496 2 0.381 0.487 
29 0.311 0.370 0.487 28 0.374 0.479 
30 0.305 0.364 0.478 29 0.367 0.471 
30 0.361 0.463 
From the Institute of Mathematical Statistics. 35 0.334 0.430 
40 0.312 0.403 
45 0.294 0.380 
50 0.279 0.361 
55 0.266 0.345 
60 0.254 0.330 
65 0.244 0.317 
70 0.235 0.306 
7) 0.227 0.296 
80 0.220 0.286 
85 0.213 0.278 
90 0.207 0.270 
95 0.202 0.263 
100 0.197 0.256 


The critical values in Table 11 were generated 
using Excel. 


A27 


APPENDIX B Table 12—Critical Values for the Number of Runs 


Table 12—Critical Values for the Number of Runs 


Reject the null hypothesis when the test statistic G is less than or equal to the smaller 


entry or greater than or equal to the larger entry. 


Value of nz 


10 
10 
23 
11 
24 
12 
25 
12 
25 
13 
26 
3 
2 
13 
27 
14 
28 


10 
22 
10 
23 
11 
23 
11 
24 
12 
25 
12 
26 
13 
26 
13 
27 
13 
27 


0.05. 


Note: Table 12 is for a two-tailed test with a 


From the Institute of Mathematical Statistics. 


APPENDIX C 


What You Should Learn 


~ How to construct and interpret 
a normal probability plot 


A28 


Normal Probability Plots 


For many of the examples and exercises in this text, it has been assumed that a 
random sample is selected from a population that has a normal distribution. After 
selecting a random sample from a population with an unknown distribution, how 
can you determine whether the sample was selected from a population that has 
a normal distribution? 

You have already learned that a histogram or stem-and-leaf plot can reveal 
the shape of a distribution and any outliers, clusters, or gaps in a distribution 
(see Sections 2.1, 2.2, and 2.3). These data displays are useful for assessing large 
sets of data, but assessing small data sets in this manner can be difficult and 
unreliable. A reliable method for assessing normality in any data set is to use a 
normal probability plot. 


DEFINITION 


A normal probability plot (also called a normal quantile plot) is a graph that 


plots each observed value from the data set along with its expected z-score. 
The observed values are usually plotted along the horizontal axis while the 
expected z-scores are plotted along the vertical axis. 


The guidelines below can help you determine whether data come from a 
population that has a normal distribution. 


1. If the plotted points in a normal probability plot are approximately linear, 
then you can conclude that the data come from a normal distribution. 


2. If the plotted points are not approximately linear or follow some type of 
pattern that is not linear, then you can conclude that the data come from a 
distribution that is not normal. 


3. Multiple outliers or clusters of points indicate a distribution that is not normal. 


Two normal probability plots are shown below. The normal probability plot on 
the left is approximately linear. So, you can conclude that the data come from 
a population that has a normal distribution. The normal probability plot on the 
right follows a nonlinear pattern. So, you can conclude that the data do not come 
from a population that has a normal distribution. 


y y 
A A 
3 3-+- 
o e o e 
5 2-+-- e° 5 2-+- fe 
2 1+ e 1+ = 
N N 
3 ON OF tt 3B 0A+—4+""_" + - « 
5 1 40 42 a a 50 52 54 56 58 5 1 46 48 52 54 56 58 60 
Oe S257 DS 
Q, Qu 
K i 
os er m2 e 
seg ate 34+ 


Observed value Observed value 


APPENDIX C_ Normal Probability Plots A29 


Constructing a normal probability plot by hand can be rather tedious. You 
can use technology such as Minitab, Excel, StatCrunch, or the TI-84 Plus to 
construct a normal probability plot, as shown in Example 1. 


Constructing a Normal Probability Plot 


The heights (in inches) of 12 randomly selected current National Basketball 
Association players are listed. Use technology to construct a normal probability 
plot to determine whether the data come from a population that has a normal 
distribution. 


74 69 78 75 73 71 80 82 81 76 86 77 


SOLUTION 


Using Minitab, enter the heights into column Cl. From the Graph menu, 
select “Probability Plot,” choose the option “Single,” and click OK. Next, 
select column C1 as the graph variable. Then click “Distribution” and choose 
“Normal” from the drop-down menu. Click the Data Display tab, select 
“Symbols only,” and click OK. After clicking “Scale,” click the Y-Scale Type 
tab, select “Score,” and click OK. Click OK to construct the normal probability 
plot. Your result should be similar to the one shown below. (To construct a 
normal probability plot using a TI-84 Plus, follow the instructions in the Tech 


Tip at the left.) 


Tech Tip 


Here are instructions 
for constructing a 
normal probability plot 
using a TI-84 Plus. 
First, enter the data 
into List 1. Then use 
Stat Plot to construct the normal 
probability plot, as shown below. 


Normal Probability Plot of Player Heights 
Normal 
a 
Pa sh a aS ns SC Sa a eae a | 
DWC eh PC) et ee IR a 2 
Tk Sh ML ae cele he Sie Sod die re bedi 
Po PI hh ge IP a de A 
ie Me Bo: a de She Teale th gee | data 
tp ae ae eS ee ane Pr eee | 
Te Whe De BIg Sh ale dh eal EE Sale a ie de 
Toe ap? sth UE Shr is das se RS ae ait pe i ae To 
2 ak de alsa Wa ee er ce lk OS 
6) Ose ct oe ee 
S rohit be eM hk ke as fe) 
a Pee Ded Eh ot ao si 
De ep Wh a SR i UM de STE ates le sabe lt a 
Deo ARI A he tte ede te gE a a 
ee ee ee ee eee 
Tes ei a a aie Ib at IS De i ols « Gl 
Gh MGS he oe al Sa lh ee oN ae al), ab lit ody 42 
ee i Sale Be i eG Oe ae ea Ee 
Te ik 2 oa oth le eM) es De ln 2d 2 
S25 Tah Pat re Poteet Po ry 
T tT tT T t t T T 
70.0 72.5 75.0 775 80.0 82.5 85.0 875 
Player heights 


Interpretation Because the points are approximately linear, you can conclude 
that the sample data come from a population that has a normal distribution. 


TRY IT YOURSELF 1 
The balances (in dollars) on student loans for 18 randomly selected college 
seniors are listed. Use technology to construct a normal probability plot 
| MINITAB | to determine whether the data come from a population that has a normal 
distribution. 
Normal Probability Plot of Player Heights 


aiotel se 29,150 16,980 12,470 19,235 15,875 8,960 16,105 14,575 39,860 
20,170 9,710 19,650 21,590 8,200 18,100 25,530 9,285 10,075 
Answer: Page A39 


To see that the points are approximately linear, you can graph the regression 
line for the observed values from the data set and their expected z-scores. The 
regression line for the heights and expected z-scores from Example 1 is shown 
in the graph at the left. From the graph, you can see that the points lie along 
the regression line. You can also approximate the mean of the data set by 
determining where the line crosses the x-axis. 


Player heights 


A30 APPENDIX C_ Normal Probability Plots 


C EXERCISES 


For Extra Help: MyLab Statistics 
1. Ina normal probability plot, what is usually plotted along the horizontal axis? 
What is usually plotted along the vertical axis? 


2. Describe how you can use a normal probability plot to determine whether 
data come from a normal distribution. 


Graphical Analysis Jn Exercises 3 and 4, use the histogram and normal 
probability plot to determine whether the data come from a normal distribution. 
Explain your reasoning. 


3. Roller Coaster Heights Roller Coaster Heights 
A ry 
34+ 
got =i 
e 
sy SB y+ o 
8 N 
s zg 0 ft 
2 3 200 300 400 500 
By = -1+ 
* 
mM -2-—e 
e 
-3- 
Observed value 
Height (in feet) 
4. Female Femur Lengths Female Femur Lengths 
ry A 
3+ 
5 2 2+ be 
oO io} 
& on 
2 Ny 
a BZ of-\+—+ 4 ft +t 
B 3 35 40 45 
s eT z 
wv mB 9+ °° 
-34-4 


Observed value 


tuo FaagcHuinad 
aaa nramnt +t + 


Length (in centimeters) 


Constructing a Normal Probability Plot Jn Exercises 5 and 6, use 
technology to construct a normal probability plot to determine whether the data 
come from a population that has a normal distribution. 


BG 5. Reaction Times The reaction times (in milliseconds) of 30 randomly 
selected adult females to an auditory stimulus 


507 389 305 291 336 310 514 442 
373 428 387 454 323 441 388 426 
411 382 320 450 309 416 359 388 
307 337 469 351 422 413 


7% 6. Triglyceride Levels The triglyceride levels (in milligrams per deciliter 
of blood) of 26 randomly selected patients 


209 140 155 170 265 138 180 
295 250 320 270 225 215 390 
420 462 150 200 400 295 240 
200 190 145 160 175 


TRY IT YOURSELF ANSWERS 


Chapter 1 
Section 1.1 


1. The population consists of the responses of all ninth to 
twelfth graders in the United States. The sample consists 
of the responses of the 1501 ninth to twelfth graders in the 
survey. The sample data set consists of 1215 ninth to twelfth 
graders who said leaders today are more concerned with 
their own agenda than with achieving the overall goals of the 
organization they serve and 286 ninth to twelfth graders who 
did not say that. 

2 a. Population parameter, because the total spent on 
employees’ salaries, $5,150,694, is based on the entire 
company. 

b. Sample statistic, because 43% is based on a subset of the 

population. 

3 a. The population consists of the responses of all U.S. adults, 
and the sample consists of the responses of the 1000 USS. 
adults in the study. 

b. The part of this study that represents the descriptive 
branch of statistics involves the statement “three out of 
four adults will consult with their physician or pharmacist 
and only 8% visit a medication-specific website [when 
they have a question about their medication].” 

A possible inference drawn from the study is that most 

adults consult with their physician or pharmacist when 

they have a question about their medication. 


Cc 


Section 1.2 


1. The city names are nonnumerical entries, so these are 
qualitative data. The city populations are numerical entries, 
so these are quantitative data. 

2. (1) Ordinal, because the data can be put in order. 

(2) Nominal, because no mathematical computations can be 
made. 

3. (1) Interval, because the data can be ordered and meaningful 
differences can be calculated, but it does not make sense 
to write a ratio using the temperatures. 

(2) Ratio, because the data can be ordered, meaningful 
differences can be calculated, the data can be written as 
a ratio, and the data set contains an inherent zero. 


Section 1.3 


1. This is an observational study. 


. There is no way to tell why the people quit smoking. They 


could have quit smoking as a result of either chewing the 
gum or watching the DVD. The gum and the DVD could be 
confounding variables. To improve the study, two experiments 
could be done, one using the gum and the other using the 
DVD. Or just conduct one experiment using either the gum 
or the DVD. 


. Sample answer: Assign numbers 1 to 79 to the employees of 


the company. Use the table of random numbers and obtain 
63, 7, 40, 19, and 26. The employees assigned these numbers 
will make up the sample. 


. (1) The sample was selected by using the students in a 


randomly chosen class. This is cluster sampling. 

(2) The sample was selected by numbering each student in 
the school, randomly choosing a starting number, and 
selecting students at regular intervals from the starting 
number. This is systematic sampling. 


Chapter 2 


Section 2.1 
x Class | Frequency, f 
14-20 8 
1-07 15 
28-34 14 
35-41 7 
42-48 
49-55 3 
2. 
Frequency, Relative | Cumulative 
Class f Midpoint | frequency | frequency 
14-20 8 17 0.1569 8 
21-27 15 24 0.2941 23 
28-34 14 31 0.2745 37 
35-41 7 38 0.1373 44 
42-48 4 45 0.0784 48 
49-55 3 52 0.0588 51 
Xf = 51 xf =1 


Sample answer: The most common range of points scored by 
winning teams is 21 to 27 About 14% of the winning teams 
scored more than 41 points. 


A31 


A32 


3. 


TRY IT YOURSELF ANSWERS 


Points Scored by 
Winning Super Bowl Teams 
A 


18:5= 


iy 


0 a4 31 38 45 2 

Points scored 
Sample answer: The most common range of points scored by 
winning teams is 21 to 27 About 14% of the winning teams 
scored more than 41 points. 


Points Scored by 
Winning Super Bowl Teams 


Frequency 


10 17 24 31 38 45 52 59 
Points scored 


Sample answer: The frequency of points scored increases up 
to 24 points and then decreases. 


Points Scored by 6. 
Winning Super Bowl Teams 
A 


Points Scored by 
Winning Super Bowl Teams 


i) 
So 
ni 
T 


Relative frequency 
sescece 
T 
Cumulative 
frequency 


REG 
+4 
a 


OMG 
a Ss 
a A 


ma 
1 S 
g N 


Points scored Points scored 


Section 2.2 


1 


1|46667 Key: 1/4 = 14 
2/00011133444467777789 
3}/0111122344445557889 

4|23689 

5.25 


Sample answer: Most of the winning teams scored between 20 


and 39 points. 


» 


4. 


1|4 Key: 1/4 = 14 
1|6667 
2/000111334444 
2/67777789 
3/011112234444 
3/5557889 

4/23 

4/689 

3:2 

5 | 5 


Sample answer: Most of the winning teams scored from 20 to 
35 points. 


Points Scored by Winning Super Bow! Teams 


. 
1 
T T T 
10 15 20 25 30 35 40 45 50 55: 
Points scored 


Sample answer: Most of the points scored by winning teams 
cluster between 20 and 40. 


Earned Degrees 
Conferred in 1990 


Doctoral 
5.4% 


Associate’s 
23.5% 


Master’s 
170% 


From 1990 to 2014, as percentages of the total degrees 
conferred, associate’s degrees increased by 2.9%, bachelor’s 
degrees decreased by 5.1%, master’s degrees increased by 
2.8%, and doctoral degrees decreased by 0.7%. 


Causes of BBB Complaints 
A 

20,000 —+——_ 

18,000 +— 

16,000 + 
> 14,000 +— 
5 12,000 + 
& 10,000 +— 
8,000 +-— 
6,000 +— 
4,000 + 
2,000 + 


Fre 


= 
| | 


Collection 
agencies 

Auto dealers 
(used cars) 
Insurance 
companies 
Travel agencies 
and bureaus 
Mortgage 
brokers 


Cause 


Collection agencies are the greatest cause of complaints. 


Salaries 
A 
_~ 50,000 4- 
7) 
& 45,000 =- ; e . 
3 40,000-++ ee 
3 
& 35,000 =- we 
a 30,000 =— ee 
25,000 
A 20,000-4 
2 4 6 8 10 
Length of employment 
(in years) 


It appears that the longer an employee is with the company, 
the greater the employee’s salary. 


Burglaries 


fo 

o 
1 
{> 


) 


all a 


Burglaries 
(in millions 
poe 
own 

1 


S 
av 
1 


i i J i i i 
ttt 
2005 2007 2009 2011 
Year 


t+——_+—+}>- 


2013 2015 


Sample answer: The number of burglaries remained about 
the same until 2012 and then decreased through 2015. 


Section 2.3 
1. About 30.2 2. 30 3. 28.5 4, 27 5. “some” 
6. X ~ 21.6;median = 21;mode = 20 


7. 
8. 


The mean in Example 6 (¥ ~ 23.8) was heavily influenced 
by the entry 65. Neither the median nor the mode was 
affected as much by the entry 65. 

About 2.6 

About 30.0; This is very close to the mean found using the 
original data set. 


Section 2.4 


1. 


. 


10. 


35, or $35,000; The range of the starting salaries for 
Corporation B, which is $35,000, is much larger than the 
range of Corporation A. 

o” ~ 110.3;0 ~ 10.5, or $10,500 

se ~1771;5 ~ 133 4 xX ~ 198;5 ~ 7.8 

Sample answer: 7,7,7,7, 7, 13, 13, 13, 13, 13 6. 34.13% 

At least 75% of Iowa’s population is between 0 and 86.3 years 
old. Because 80 < 86.3, an age of 80 lies within two standard 
deviations of the mean. So, the age is not unusual. 
X=17;5 ~ 15 

Both the mean and sample standard deviation decreased 
slightly. 

x ~ 195.5;5 ~ 169.5 

Both the mean and sample standard deviation increased. 
Los Angeles: CV ~ 47.2% 

Dallas: CV ~ 39.4% 

The office rental rates are more variable in Los Angeles 
than in Dallas. 


TRY IT YOURSELF ANSWERS A33 


Section 2.5 


1. 


im) 


i 


Q; 23; Q, 30, Q; 35 
About one-quarter of the winning scores were 23 points 
or less, about one-half were 30 points or less, and about 


three-quarters were 35 points or less. 


. O1 = 23.5, Oo = 30, Oz = 41 


About one-quarter of these universities charge tuition of 
$23,500 or less, about one-half charge $30,000 or less, and 
about three-quarters charge $41,000 or less. 


. IQR = 12; 55 is an outlier. 


Points Scored by 
Winning Super Bowl Teams 


a 


14 23 3035 55 


<+++++44 +—t— 
10 15 20 25 30 35 40 45 50 55 
Points 


> 


About 50% of the winning scores were between 23 and 
35 points. About 25% of the winning scores were less than 
23 points. About 25% of the winning scores were greater than 
35 points. 


. 19.5; About 10% of the winning scores were 19 points or less. 
. 28th percentile 
. For $60, z = —1.25. 


For $71, z = 0.125. 
For $92, z = 2.75. 


» Man: z = —3.3; Woman: z ~ —1.7 


The z-score for the 5-foot-tall man is 3.3 standard deviations 
below the mean. This is a very unusual height for a man. The 
z-score for the 5-foot-tall woman is 1.7 standard deviations 
below the mean. This is among the typical heights for a woman. 


Chapter 3 


1. 


Section 3.1 
(1) 
MES No Not sure 
| | | 
[ | [ | | | 
M F M F M F 
6 outcomes 


Let Y = Yes, N = No, NS = Not sure, M = Male, 
F = Female. 
Sample space = {YM, YF, NM, NF, NSM, NSF} 


(2) 


Yes No Not sure 


| | | 
I l ee l | I | | 
18-34 35-49 504 18-34 35-49 50% 1834 35-49 50+ 


9 outcomes 

Let Y = Yes, N = No, NS = Not sure, 

50+ = 50 and older. 

Sample space = {Y18-34, Y35-49, Y50+, N18-34, 
N35-49, N50+, NS18-34, NS35-49, NS50-+ } 


A34 TRY IT YOURSELF ANSWERS 


(3) | 
| | | 


‘Yes No Not sure 


NE S MW W |NE|S [MW W NE S MW W 
12 outcomes 
Let Y = Yes, N = No, NS = Not sure, 
NE = Northeast, S = South, MW = Midwest, 
W = West. 
Sample space = {YNE, YS, YMW, YW, NNE, NS, 
NMW, NW, NSNE, NSS, NSMW, NSW } 

2. (1) 6; Not a simple event because it is an event that consists 
of more than a single outcome. 

(2) 1; Simple event because it is an event that consists of a 

single outcome. 


-W 


-R 
M-++B 
LG 
LT 


(2) 165,765,600 


4. (1) 308,915,776 
(4) 106,932,384 


(3) 261,390,272 


5. (1) 0.019 (2) 0.25 (3) 1 6. 0.061 7. 0.261 
8. Empirical probability 9. 0.84 10. 0.313 
1 


ah 10,000,000 


Section 3.2 

1. 0.488 

2. (1) Dependent (2) Independent 
3. (1) 0.723 (2) 0.059 

4. (1) 0.729 (2) 0.001 (3) 0.999 
5. (1) 0.163 (2) 0.488 


Both of the events are not unusual because their probabilities 
are not less than or equal to 0.05. 


Section 3.3 

1. (1) Not mutually exclusive; The events can occur at the same 
time. 

(2) Mutually exclusive; The events cannot occur at the same 

time. 

2. (1) 0.667 (2) 0.423 3. 0.222 

4. (1) 0.149 (2) 0.149 (3) 0.910 (4) 0.499 5. 0.806 

Section 3.4 

1. 3,628,800 2. 336 3. 11,880 4. 77597520 

5. 1140 6. 0.003 7. 0.0009 8. 0.045 

Chapter 4 

Section 4.1 


1. (1) The random variable is continuous because x can be any 
speed up to the maximum speed of a rocket. 
(2) The random variable is discrete because the number of 
calves born on a farm in one year is countable. 
(3) The random variable is discrete because the number of 
days of rain for the next three days is countable. 


2. New Employee Sales 
x f P(x) P(x) 
A 
0 16 0.16 0.20+ 
1 19 0.19 2 o1s+ 
2| 15 0.15 2 o10+ 
3 21 0.21 0.05 -+ 
4 9 0.09 x 
01234567 
5 10 0.10 Number of sales per day 
6 0.08 
7 0.02 


n=100| P(x) =1 


3. Each P(x) is between 0 and 1 and }P(x) = 1. Because 
both conditions are met, the distribution is a probability 
distribution. 

4. (1) Probability distribution; The probability of each outcome 

is between 0 and 1, and the sum of all the probabilities is 1. 
(2) Not a probability distribution; The sum of all the 
probabilities is not 1. 

5. w = 2.6; On average, a new employee makes 2.6 sales per 
day. 

6. 0° ~ 3.7;0 ~ 19 

7. —$3.08; Because the expected value is negative, you can 
expect to lose an average of $3.08 for each ticket you buy. 


Section 4.2 
1. Binomial experiment 

n = 10, p = 0.25, q = 0.75, x = 0, 1, 2,3, 4, 5, 6, 7, 8, 9, 10 
2. 0.088 


8. 


. (1) 0.284 


4. 0.007 
P(x) 


0.193 
0.376 
0.293 
0.114 
0.022 
0.002 


Ak WN Fr OC] HK 


(2) 0.409 (3) 0.591 6. :0.031 


Reading an e-Book 


P(x) P(x) 
0.269 OA 
0.418 pe 0357 
0.244 
0.063 
0.006 0.05 4 


Probabilit 
o 
8 
r 


BRwWN Fr CO] & 


0 1 2 3 4 
Number of adults 


pe ~ 13.6; 0? ~ 7.6; 0 ~ 2.8; On average, there are about 
14 clear days during the month of May. A May with fewer than 
8 clear days or more than 19 clear days would be unusual. 


Section 4.3 


1. 


0.066 2. 0.185 3. 0.0002 


Chapter 5 
Section 5.1 


1. 


2. 
4. 


(1) Curve B has the greatest mean. 

(2) Curve C is more spread out, so curve C has the greatest 
standard deviation. 

pe = 300; 0 = 37 3. (1) 0.0143 

0.9834 5. 0.9846 6. 0.0733 


(2) 0.9850 


Section 5.2 


1. 


0.1957 2. 0.7357; 147 3. 0.4352 


Section 5.3 


1. 
2. 
3. 


(1) -177 (2) 1.96 

(1) -128 (2) —0.84 (3) 2.33 

(1) 1705 pounds (2) 97 pounds (3) 60.7 pounds 

1705 pounds is to the left of the mean, 97 pounds is to the 
right of the mean, and 60.7 pounds is to the right of the mean. 


. The longest braking distance one of these cars could have and 


still be in the bottom 1% is about 117 feet. 


. The maximum length of time an employee could have 


worked and still be laid off is about 8.5 years. 


. 0.9744 
. 0.5832; 0.7454 


TRY IT YOURSELF ANSWERS A35 


Section 5.4 

Sample | Mean Sample | Mean 
sles ea 1 3,3;5 3.67 
1,1,3 1.67 33.55. 1 3 
1,4,.5 2.33 3, 5,3 3.67 
1,3,1 1.67 3, 5,5 4.33 
1, 3,3 2.33 5,1,1 2.33 
1,3,5 3 5, 1,3 3 
1,5, 2.33 5513.5 3.67 
1553 3 5, 3s 3 
1,5,5 3.67 5, 3,3 3.67 
3,1,1 1.67 5,355 4.33 
3,1,3 2.33 5,9; 1 3.67 
3,1,5 3 5,.5,3 4.33 
3;,33.1 2.33 93.959 5 
3,3,3 3 


6 6.5 P 75 
Mean of sleep times (in hours) 


With a smaller sample size, the mean stays the same but the 
standard deviation increases. 


» by = 3.5, og = 0.05 


3.35 3.40 3.45 3.50 3.55 3.60 3.65 
Mean diameter (in feet) 


5. 0.7673 


There is about a 58% chance that an LCD computer monitor 
will cost less than $200. There is about a 75% chance that 
the mean of a sample of 10 LCD computer monitors is less 
than $200. 


A36 TRY IT YOURSELF ANSWERS 


Section 5.5 2. A type I error will occur when the actual proportion is less 
than or equal to 0.01, but you reject Hp. 

A type II error will occur when the actual proportion is 
greater than 0.01, but you fail to reject Hp. 

A type II error is more serious because you would be 


1. Because np and nq are greater than 5, a normal distribution 
can be used. 
2. (1) 565 <x < 83.5 (2) x < 545 3. 0.3707 


4. 0.0281 5. 0.0083 misleading the consumer, possibly causing serious injury or 


death. 
3. (1) Ho: The mean life of a certain type of automobile batter 
Chapter 6 ee is 74 months. a ° 
Section 6.1 H,: The mean life of a certain type of automobile battery 
is not 74 months. 
1. 20.9 2. 0.8 hour Ho: w = 74; Hy: pw # 74 
3. (20.1, 21.7); This confidence interval is wider than the one Two-tailed 


found in Example 3. 


eA 1 P-value ; P-value 
4. (20.6, 21.5); (20.5, 21.6); (20.4, 21.7); As the confidence level area area 
increases, so does the width of the interval. z 


5. (22.4, 23.4) [Tech: (22.5, 23.4)]; Because of the larger sample ‘ : 


size, the confidence interval is slightly narrower. (2) Ho: The variance of the life of a manufacturer’s home 


6. 37; Because of the larger margin of error, the sample size theater systems is less than or equal to 2.7 


needed ig smaller: H,: The variance of the life of a manufacturer’s home 


theater systems is greater than 2.7 
Ho: 0? = 2.7; Hy o* > 2.7 


Section 6.2 Right-tailed 
1. 1.721 2. (157.6, 166.4); (154.6, 169.4) value 
3. (9.08, 10.42); (8.94, 10.56); The 90% confidence interval is LN 
slightly narrower. z : 
4. Use at-distribution because o is not known and the population (3) Ho: The proportion of homeowners who feel their house 
is normally distributed. is too small for their family is less than or equal to 
24%. 
Section 6.3 Ai: The proportion of homeowners who feel their house 
is too small for their family is greater than 24%. 
1. 15% 2. (0.14, 0.16) 3. (0.454, 0.546) Hp: p = 0.24; H,: p > 0.24 
4. (1) 1692 (2) 400 Right-tailed 


P-value 
Section 6.4 LN 


1. 42.557, 17.708 ° 


2. Population variance: (0.98, 2.36), (0.91, 2.60) 4. (1) There is enough evidence to support the claim that the 
Standard deviation: (0.99, 1.54), (0.96, 1.61) mean life of a certain type of automobile battery is not 
74 months. 
There is not enough evidence to support the claim that 
Chapter 7 the mean life of a certain type of automobile battery is 
not 74 months. 
Section 7.1 (2) There is enough evidence to support the realtor’s claim 
1. (1) The mean is not 74 months. that the proportion of homeowners who feel their house 
pet is too small for their family is more than 24%. 
Ho: » = 74; Hy: w ¥ 74 (claim) There is not enough evidence to support the realtor’s 


(2) The variance is less than or equal to 2.7 claim that the proportion of homeowners who feel their 


oe <27 house is too small for their family is more than 24%. 
Hy: 0? = 2.7 (claim); H,: 0? > 2.7 5. (1) Ho: w= 650; Hy: 4 < 650 (claim) . 
(3) The proportion is more than 24%. If you Teject A, then you will support the claim that the 
p> 0.24 mean repair cost per automobile is less than $650. 
Hp: p < 0.24; Hy: p > 0.24 (claim) (2) Ho: 4 = 98.6 (claim); Ha: « # 98.6 
If you reject Ho, then you will reject the claim that the 
mean temperature is about 98.6°F. 


Section 7.2 


1. (1) Fail to reject Hp. (2) Reject Hp. 

2. 0.0436; Reject Hp because 0.0436 < 0.05. 

3. 0.1010; Fail to reject Hy because 0.1010 > 0.01. 

4. There is enough evidence at the 5% level of significance 
to support the claim that the average speed is greater than 
35 miles per hour. 

5. There is not enough evidence at the 1% level of significance 
to support the claim that the mean number of workdays 
missed due to illness or injury in the past 12 months is 
3.5 days. 

6. Fail to reject Hp. 

» Zy = —1.28; Rejection region: z < —1.28 

8. =z = =1.75, z = 1.75 
Rejection regions: z < —1.75, z > 1.75 

9. There is enough evidence at the 1% level of significance 
to support the claim that the mean workday is less than 
8.5 hours. 

10. There is not enough evidence at the 1% level of significance 
to reject the claim that the mean cost of raising a child (age 2 
and under) by married-couple families in the United States 
is $14,050. 


bo | 


Section 7.3 


1. —2.650 2. 1.397 3. —2.131, 2.131 

4. There is enough evidence at the 10% level of significance to 
support the claim that the mean age of a used car sold in the 
last 12 months is less than 4.1 years. 

5. There is enough evidence at the 1% level of significance to 
reject the company’s claim that the mean conductivity of the 
river is 1890 milligrams per liter. 

6. There is not enough evidence at the 5% level of significance 
to reject the office’s claim that the mean wait time is at most 
18 minutes. 


Section 7.4 


1. There is not enough evidence at the 1% level of significance 
to support the claim that more than 90% of U.S. adults have 
access to a smartphone. 

2. There is enough evidence at the 10% level of significance 
to reject the claim that 67% of US. adults believe that 
doctors prescribing antibiotics for viral infections for 
which antibiotics are not effective is a significant cause of 
drug-resistant superbugs. 


Section 7.5 


1. x? = 33.409 = 2. y? = 17708 

3. x7 = 27.991, xz = 79.490 

4. There is enough evidence at the 1% level of significance 
to reject the bottling company’s claim that the variance of 
the amount of sports drink in a 12-ounce bottle is no more 
than 0.40. 


TRY IT YOURSELF ANSWERS A37 


. There is not enough evidence at the 5% level of significance 


to support the police chief’s claim that the standard deviation 
of the lengths of response times is less than 3.7 minutes. 


. There is enough evidence at the 10% level of significance to 


reject the company’s claim that the variance of the weight 
losses of the users is 25.5. 


Chapter 8 
Section 8.1 


1. 
2. 


(1) Independent (2) Dependent 

There is not enough evidence at the 10% level of significance 
to support the claim that there is a difference in the mean 
annual wages for forensic science technicians working for 
local and state governments. 


. There is not enough evidence at the 5% level of significance 


to support the travel agency’s claim that the average daily 
cost of meals and lodging for vacationing in Alaska is greater 
than the average daily cost in Colorado. 


Section 8.2 


1. 


There is enough evidence at the 5% level of significance 
to support the claim that there is a difference in the mean 
annual earnings based on level of education. 


. There is not enough evidence at the 10% level of significance 


to support the manufacturer’s claim that the mean driving 
cost per mile of its minivans is less than that of its leading 
competitor. 


Section 8.3 


1. 


There is not enough evidence at the 5% level of significance 
to support the claim that athletes can decrease their times in 
the 40-yard dash. 


. There is not enough evidence at the 5% level of significance 


to support the claim that the drug changes the body’s 
temperature. 


Section 8.4 


1. 


There is not enough evidence at the 5% level of significance 
to support the claim that there is a difference between the 
proportion of 40- to 49-year-olds who are yoga users and the 
proportion of 40- to 49-year-olds who are non-yoga users. 


. There is enough evidence at the 5% level of significance 


to support the claim that the proportion of yoga users with 
incomes of $20,000 to $34,499 is less than the proportion of 
non-yoga users with incomes of $20,000 to $34,499. 


A38 TRY IT YOURSELF ANSWERS 


Chapter 9 


Section 9.1 
1. Y 


Annual contribution 
(in thousands of dollars) 


5 10 15 20 25 30 
Years out of school 


It appears that there is a negative linear correlation. As 
the number of years out of school increases, the annual 
contribution tends to decrease. 


Pulse rate 
(in beats per minute) 
co 
o 
t 


x 
62 66 70 74 78 
Height (in inches) 
It appears that there is no linear correlation between height 
and pulse rate. 


3. 
A 
50,000 + 
3 40,000 +- ° m 
oe 
2 30,000+ @ @ 2% ° 
s e 
% 2000+ Mee 
2 20, e°@ 
1 e 
S 
& 10,000 +- 
x 


50 100 150 200 250 


Salary 
(in millions of dollars) 


It appears that there is a positive linear correlation. As the 
team salary increases, the average attendance per home 
game tends to increase. 

4. Because r is close to —1, this suggests a strong negative linear 
correlation. As the number of years out of school increases, 
the annual contribution tends to decrease. 

5. 0.775; Because r is close to 1, this suggests a strong positive 
linear correlation. As the team salaries increase, the average 
attendance per home game tends to increase. 

6. |r| ~ 0.908 > 0.875; The correlation is significant. 

There is enough evidence at the 1% level of significance 
to conclude that there is a significant linear correlation 
between the number of years out of school and the annual 
contribution. 

7. There is enough evidence at the 1% level of significance to 
conclude that there is a significant linear correlation between 
the salaries and average attendances per home game for the 
teams in Major League Baseball. 


Section 9.2 


= —0.380x + 12.876 
= 108.022x + 16,586.282 
(1) 58.645 minutes (2) 75.120 minutes 


1. 5 
2. 9 
3. 


Section 9.3 


1. 0.958; About 95.8% of the variation in the times is explained. 
About 4.2% of the variation is unexplained. 

2. 6.218 

3. 477.553 < y < 1230.799 
You can be 95% confident that when the gross domestic 
product is $4 trillion, the carbon dioxide emissions will be 
between 477553 and 1230.799 million metric tons. 


Section 9.4 

1. § = 46.385 + 0.540x, — 4.897x, 

2. (1) 90 (2) 74 (3) 81 

Chapter 10 

Section 10.1 

. Tax preparation Expected 
method % of people | frequency 
Accountant 24% 120 
By hand 20% 100 
Computer software 35% 175 
Friend/family 6% 30 
Tax preparation service 15% 75 


2. There is not enough evidence at the 5% level of significance 
to support the sociologist’s claim that the age distribution 
differs from the age distribution 10 years ago. 

3. There is enough evidence at the 5% level of significance 
to reject the claim that the distribution of different-colored 
candies in bags of peanut M&M’s is uniform. 


Section 10.2 


1. Ey, = 44.4, Ey. = 97.2, E,,3 = 16.8, Ey, 4 = 21.6, 
Ey, = 29.6, Ey. = 64.8, Ey.3 = 11.2, Ey 4 = 14.4 

2. There is not enough evidence at the 1% level of significance 
for the consultant to conclude that travel concern is dependent 
on travel purpose. 

3. There is enough evidence at the 1% level of significance to 
conclude that whether or not a tax credit would influence an 
adult to purchase a hybrid vehicle is dependent on age. 


Section 10.3 


1. 2.45 2. 18.31 

3. There is enough evidence at the 1% level of significance 
to support the researcher’s claim that a specially treated 
intravenous solution decreases the variance of the time 
required for nutrients to enter the bloodstream. 

4. There is not enough evidence at the 1% level of significance 
to reject the biologist’s claim that the pH levels of the soil in 
the two geographic locations have equal standard deviations. 


Section 10.4 


1. There is enough evidence at the 5% level of significance for 
the analyst to conclude that there is a difference in the mean 
monthly sales among the sales regions. 

2. There is not enough evidence at the 5% level of significance 
to conclude that there is a difference in the means of the 
GPAs. 


Appendix A 


1. (1) 0.4857. (2) z= +2.17 
2. 0.9834 3. 0.9846 4. 0.0733 


Appendix C 


1L 


Because the points do not appear to be approximately linear 
and there is an outlier, you can conclude that the sample 
data do not come from a population that has a normal 
distribution. 


TRY IT YOURSELF ANSWERS 


A39 


ODD ANSWERS 


Chapter 1 


Section 1.1 


1. 
3. 


11. 


13 


. 


15. 


17 


° 


19. 


21 


. 


23 


. 


25. 


° 


27 


. 


29 


. 


31. 


(page 28) 


A sample is a subset of a population. 

A parameter is a numerical description of a population 
characteristic. A statistic is a numerical description of a 
sample characteristic. 

False. A statistic is a numerical description of a sample 
characteristic. 

True 

False. A population is the collection of all outcomes, 
responses, measurements, or counts that are of interest. 
Sample, because the collection of 95 shopkeepers is a subset 
of the population of 550 shopkeepers in the commercial 
complex. 

Population, because it is a collection of the heights of each 
of the athlete participating in the Summer Olympics. 
Sample, because the collection of the 10 patients is a subset 
of the population of 50 patients at the clinic. 

Population, because it is a collection of all the gamers’ scores 
in the tournament. 

Sample, because the collection of the top 10 taxpayers is a 
subset within the population of the country’s total tax payers. 
Population: Parties of registered voters 

Sample: Parties of registered voters who respond to a survey 
Population: Ages of adults in the United States who own 
automobiles 

Sample: Ages of adults in the United States who own Honda 
automobiles 

Population: Collections of the responses of all U.S. adults 
Sample: Collection of the responses of the 1020 US. adults 
surveyed 

Sample data set: 42% of adults who said they trust their 
political leaders and 58% who said they did not 
Population: Collection of the influenza immunization status 
of all adults in the United States 

Sample: Collection of the influenza immunization status of 
the 3301 US. adults surveyed 

Sample data set: 39% of U.S. adults who received an 
influenza vaccine and 61% who did not 

Population: Collection of the average hourly billing rates of 
all US. law firms 

Sample: Collection of the average hourly billing rates for 
partners of the 159 U.S. law firms surveyed 

Sample data set: The average hourly billing rate for 
partners of 159 U.S. law firms is $604. 

Population: Collection of the blood donations collected globally 
Sample: Collection of the 112.5 million blood donations 
collected globally 

Sample data set: 50% of the donors who belong to high- 
income countries and 50% who do not 


A40 


33. 


35. 


37. 


39. 
41. 


43. 


45. 


47. 


49. 


Section 1.2 


. Nominal and ordinal 
. False. Data at the ordinal level can be qualitative or 


Population: Collection of the 1000 mutual funds listed on a 

recognized stock exchange 

Sample: Collection of the 134 mutual funds of the 1000 

mutual funds listed on a recognized stock exchange 

Sample data set: Best mutual funds out of the 134 mutual 

funds listed on a recognized stock exchange 

Population Parameter. Forty out of 500 total students is a 

numerical description of the students who received a C grade. 

Sample Statistic. The value two million is a numerical 

description of a sample of civilian casualties during World 

War II. 

Population Parameter. The entire population of employees 

working in the organization has been reviewed. 

Sample statistic. The value 80% is a numerical description of 

a sample of U.S. adults. 

The statement “50% are collected from high-income 

countries” is an example of descriptive statistics. Using 

inferential statistics, you may conclude that an association 
exists between income and the number of blood donations 
in a country. 

Answers will vary. 

The inference may incorrectly imply that exercise increases 

a person’s cognitive ability. The study shows a slower decline 

in cognitive ability, not an increase. 

(a) The sample is the results on the standardized test by the 
participants in the study. 

(b) The population is the collection of all the results of the 
standardized test. 

(c) The statement “the closer that participants were to an 
optimal sleep duration target, the better they performed 
on a standardized test” is an example of descriptive 
statistics. 

(d) Individuals who obtain optimal sleep will be more 
likely to perform better on a standardized test than they 
would without optimal sleep. 


(page 35) 


quantitative. 


. False. More types of calculations can be performed 


with data at the interval level than with data at the nominal 
level. 


. Qualitative, because breeds of horses are attributes. 
. Quantitative, because blood pressure levels are numerical 


measurements. 


- Qualitative, because colors are attributes. 
. Quantitative, because weight is a numerical measurement. 
. Ordinal. Data can be arranged in order, but the differences 


between data entries are not meaningful. 


17. 
19. 
21. 
23. 
25. 
27. 
29. 
31. 


33. 


Section 1.3 
1. 


15. 
19. 


21. 


Nominal. No mathematical computations can be made, and 
data are categorized using numbers. 

Ordinal. Data can be arranged in order, but the differences 
between data entries are not meaningful. 

Horizontal: Nominal; Vertical: Ratio 

Horizontal: Nominal; Vertical: Ratio 

(a) Ordinal (b) Ratio (c) Nominal  (d) Interval 
Quantitative. Ratio. A ratio of two data entries can be formed, 
so one data entry can be expressed as a multiple of another. 
Qualitative. Ordinal. Data can be arranged in order, but the 
differences between data entries are not meaningful. 
Qualitative. Ordinal. Data can be arranged in order, but the 
differences between data entries are not meaningful. 

An inherent zero is a zero that implies “none.” Answers 
will vary. 


(page 46) 


In an experiment, a treatment is applied to part of a 
population and responses are observed. In an observational 
study, a researcher measures characteristics of interest of a 
part of a population but does not change existing conditions. 


. Ina random sample, every member of the population has an 


equal chance of being selected. In a simple random sample, 
every possible sample of the same size has an equal chance 
of being selected. 


. False. A placebo is a fake treatment. 
. False. Using stratified sampling guarantees that members of 


each group within a population will be sampled. 


. False. A systematic sample is selected by ordering a 


population in some way and then selecting members of the 
population at regular intervals. 


. Observational study. The study does not apply a treatment 


to the adults. 


. Experiment. The study applies a treatment (different 


photographs) to the subjects. 

Answers will vary. 17. Answers will vary. 

(a) The experimental units are the 500 females ages 25 
to 45 years old who suffer from migraine headaches. 
The treatment is the new drug used to treat migraine 
headaches. 

(b) A problem with the design is that the sample is not 
representative of the entire population because only 
females ages 25 to 45 were used. To increase validity, use 
a stratified sample. 

(c) For the experiment to be double-blind, neither the 
subjects nor the company would know whether the 
subjects are receiving the drug or the placebo. 

Sample answer: Treatment group: Lewis, Dennis, Jennifer, 

Ronald, Edgar, Kate, Lara, William, and Raj. Control 

group: Alice, Edwin, Mercer, Bill, Zoya, Bertha, Ahmed, 

Harry, and Arthur. 

A random number generator was used. 


23. 


25. 


« 


27. 


29 


. 


31. 


33. 


35 
37. 


© 


Section 1.3 Activity 


1. 


2. 


Uses and Abuses for Chapter 1 


1. 


Review Exercises for Chapter 1 


1. 


ODD ANSWERS A41 


Cluster sampling is used because the constituency is divided 
into areas, and 12 areas are then entirely selected. A possible 
source of bias is that problems of the residents of one area 
might be different from that of the other area. 

Cluster sampling is used because the disaster area is divided 
into grids, and 30 grids are then entirely selected. A possible 
source of bias is that certain grids may have been much 
more severely damaged than others. 

Simple random sampling is used because each house 
number has an equal chance of being selected, and all 
samples of 1638 house numbers have an equal chance of 
being selected. The sample is unbiased. 

Sampling, because the population of mobile phone 
purchasers is too large for their most popular model of 
mobile phone to be easily recorded. Random sampling 
would be advised because it would be easy to select mobile 
phone purchasers randomly and then record their most 
popular model of mobile phones. 

The question is biased because it already suggests that eating 
whole-grain foods improves your health. The question could 
be rewritten as “How does eating whole-grain foods affect 
your health?” 

The question is biased because it already suggests that 
listening to music while studying increases the chances of 
retention. The question could be rewritten as “Does listening 
to music while studying have an effect on retention?” 
Answers will vary. 

Open Question 

Advantage: Allows respondent to express some depth and 
shades of meaning in the answer. Allows for new solutions 
to be introduced. 

Disadvantage: Not easily quantified and difficult to compare 
surveys. 

Closed Question 

Advantage: Easy to analyze results. 

Disadvantage: May not provide appropriate alternatives and 
may influence the opinion of the respondent. 


(page 49) 


Answers will vary. The list contains one number at least 
twice. 

The minimum is 1, the maximum is 731, and the number of 
samples is 8. Answers will vary. 


(page 50) 


Answers will vary. 2. Answers will vary. 


(page 52) 


Population: Collection of the responses of all U.S. adults 
Sample: Collection of the responses of the 4787 U.S. adults 
who were sampled 

Sample data set: 15% of adults who use ride-hailing 
applications and 85% who do not 


. 


A42 ODD ANSWERS 


3. Population: Collection of the responses of all U.S. adults 


Sample: Collection of the responses of the 2223 U.S. adults 
who were sampled 

Sample data set: 62% of adults who would encourage a child 
to pursue a career as a video game developer or designer 
and 38% who would not 

. Population parameter. The value $22.7 million is a numerical 
description of the total infrastructure-strengthening 
investments. 

. Parameter. The 12 students minoring in math is a numerical 
description of all physics majors at a university. 

. The statement “62% would encourage a child to pursue a 
career as a video game developer or designer” is an example 
of descriptive statistics. An inference drawn from the sample 
is that a majority of people encourage children to pursue a 
career as a video game developer or designer. 

- Quantitative, because ages are numerical measurements. 

. Quantitative, because revenues are numerical measurements. 
. Interval. The data can be ordered and meaningful differences 
can be calculated, but it does not make sense to say that 
84 degrees is 1.05 times as hot as 80 degrees. 

. Nominal. The data are qualitative and cannot be arranged in 
a meaningful order. 

. Experiment. The study applies a treatment (drug to treat 
hypertension in patients with obstructive sleep apnea) to the 
subjects. 

Sample answer: The subjects could be split into male and 
female and then be randomly assigned to each of the five 
treatment groups. 

Simple random sampling is used because random telephone 
numbers were generated and called. A potential source of 
bias is that telephone sampling only samples individuals 
who have telephones, who are available, and who are willing 
to respond. 


(c) Sample statistic. The value 25% is a numerical 
description of a sample of small business owners. 


. (a) Qualitative, because debit card personal identification 


numbers are labels and it does not make sense to find 
differences between numbers. 

(b) Quantitative, because final scores are numerical 
measurements. 


. (a) Ordinal, because badge numbers can be ordered and 


often indicate seniority of service, but no meaningful 
mathematical computation can be performed. 

(b) Ratio, because one data entry can be expressed as a 
multiple of another. 

(c) Ordinal, because data can be arranged in order, but the 
differences between data entries make no sense. 

(d) Interval, because meaningful differences between 
entries can be calculated but a zero entry is not an 
inherent zero. 


. (a) Observational study. The study does not attempt to 


influence the responses of the subjects and there is no 
treatment. 

(b) Experiment. The study applies a _ treatment 
(multivitamin) to the subjects. 


. Randomized block design 
. (a) Convenience sampling, because all of the people 


sampled are in one convenient location. 

(b) Systematic sampling, because every tenth machine part 
is sampled. 

(c) Stratified sampling, because the population is first 
stratified and then a sample is collected from each 
stratum. 


. Convenience sampling. People at campgrounds may be 


strongly against air pollution because they are at an outdoor 
location. 


25. Cluster sampling is used because each district is considered 
a cluster and every pregnant woman in a selected district 
is surveyed. A potential source of bias is that the selected 
districts may not be representative of the entire area. 
Stratified sampling is used because the population is divided 
by religious groups and then 50 voters are randomly selected 
from each religious group. 


29. Answers will vary. 


° 


Real Statistics—Real Decisions for Chapter 1 (page 56) 


1. (a)-(b) Answers will vary. 

(c) Sample answer: Use surveys. 

(d) Sample answer: You may take too large a percentage of 
your sample from a subgroup of the population that is 
relatively small. 

2. (a) Sample answer: Qualitative, because questions will 
ask for demographics and the sample questions have 
nonnumerical categories. 

(b) Sample answer: Nominal and ordinal, because the 
results can be put in categories and the categories can 
be ranked. 

(c) Sample (d) Statistics 

3. (a) Sample answer: Sample includes only members of the 
population with access to the Internet. 

(b) Answers will vary. 


27 


. 


Quiz for Chapter 1 (page 54) 


1. Population: Collection of the school performance of all 
Korean adolescents 
Sample: Collection of the school performance of the 
359,264 Korean adolescents in the study 
2. (a) Sample statistic. The value 52% is a numerical 
description of a sample of U.S. adults. 
(b) Population parameter. The 90% of members that 
approved the contract of the new president is a 
numerical description of all Board of Trustees members. 


ODD ANSWERS A43 


Cha pter 2 23. (a) Class with greatest relative frequency: 35-36 centimeters 
Class with least relative frequency: 39-40 centimeters 
Section 2.1 (page 61) (b) Greatest relative frequency ~ 0.25 


Least relative frequency ~ 0.01 
(c) Sample answer: From the graph, 0.25 or 25% of females 
have a fibula length between 35 and 36 centimeters. 
25. (a) 75 (b) 158.5—201.5 pounds 
27. (a) 47 (b) 2875 pounds (c) 40 (d) 6 


1. Organizing the data into a frequency distribution may make 
patterns within the data more evident. Sometimes it is easier 
to identify patterns of a data set by looking at a graph of the 
frequency distribution. 

3. Class limits determine which numbers can belong to each 


class. Class boundaries are the numbers that separate classes ea 
without forming gaps between them. Frequency, Relative | Cumulative 
5. The sum of the relative frequencies must be 1 or 100% Class Jf Midpom@) | frequency ||) frequency 
because it is the sum of all portions or percentages of the 8-12 5 10 0.22 5 
data. 13-17 8 15 0.33 13 
7. na oi sles is . difference between lower or upper 18-22 8 20 0.33 4 
imits of consecutive classes. 
9. False. An ogive is a graph that displays cumulative rel é = ae = 
frequencies. 28-32 1 30 0.04 24 
11. Class width = 8; Lower class limits: 9, 17, 25, 33, 41, 49, 57; Sf=24 2(4) ag 
Upper class limits: 16, 24, 32, 40, 48, 56, 64 n 
13. Class width = 15; Lower class limits: 17, 32, 47 62, 77, 92, 107, 
122; Upper class limits: 31, 46, 61, 76, 91, 106, 121, 136 ibe wally crenieet Hedley rent 
15. (a) 11 Class with least frequency: 28-32 
(b) and (c) 31. 
Class Midpoint Class boundaries Cl Hrcauency: Meno eal rune ale 
ass f point | frequency | frequency 
0-10 5 —0.5-10.5 985-1288 3 1136.5 0.1429 3 
_ a eee 1289-1592 6 1440.5 | 0.2857 9 
alii Pe aS 1593-1896 2 1744.5 | 0.0952 u 
33-43 38 32.5—43.5 1897-2200 4 2048.5 0.1905 15 
44-54 49 43.5-54.5 2201-2504 2 2352.5 0.0952 17 
53-65 60 34.5-05.5 2505-2808 4 2656.5 | 0.1905 21 
66-76 71 65.5-76.5 
17. Sf= 21 3(Z) =| 
Frequency, Relative | Cumulative 
Class ui Midpoint | frequency | frequency Production for Manufacturing Plants 
0-10 188 5 0.15 188 ot 
11-21 372 16 0.30 560 g ral 
22-32 264 27 0.22 824 & 3+ 
33-43 205 38 0.17 1029 Fs ai 
44-54 83 “7 0.07 1112 7 1136.5 1440.5 1744.5 2048.5 2352.5 2656.5 ~ 
55-65 76 60 0.06 1188 Production (in units) 
66-76 32 71 0.03 1220 Sample answer: The graph shows that the production is 
f evenly distributed between 1745 units to 2657 units. 
Xf = 1220 x =1 
19. (a) 7 


(b) Greatest frequency: about 300 
Least frequency: about 10 
(c) 10 
(d) Sample answer: About half of the employee salaries are 
between $50,000 and $69,000. 
21. Class with greatest frequency: 506-510 
Classes with least frequency: 474-478 


A44 — ODD ANSWERS 
33. 37. 
Frequency, | Mid- | Relative | Cumulative Frequency, Relative | Cumulative 
Class if point | frequency | frequency Class f Midpoint | frequency | frequency 
2-4 9 3 0.31 9 1-2 7 1.5 0.19 7 
5-7 10 6 0.33 19 3-4 6 35 0.17 13 
8-10 4 9 0.13 23 5-6 14 a) 0.39 27 
11-13 1 12 0.03 24 7-8 4 7.5 0.11 31 
14-16 3 15 0.10 27 9-10 5 95 0.14 36 
17-19 2 18 0.07 29 f 
20-22 0 21 0.00 29 ay = 20 (2) a 
23-25 1 24 0.03 30 
Taste Test Ratings 
A 
=f = 30 2(4) aa | B04 + 
n 5 
2 0.3 -- 
& 02+ 
Response Times for Males Bio pek 
10 { z 0.0 4S r : 7 r > 
> 8+ 15 3.5 5.5 75 9.5 
5 6+ Ratings 
= a Class with greatest relative frequency: 5-6 
0 | Class with least relative frequency: 7-8 
a 
3 6 9 12 15 18 21 24 39, 
Response Time (in days) 
Frequency, Relative | Cumulative 
Sample answer: The graph shows that the response time is Class tf Midpoint | frequency | frequency 
evenly distributed between 9 to 18 days or 2 to 7 days. 401-420 10 410.5 0.36 10 
= 421-440 5 430.5 0.18 15 
Frequency, ro Relative | Cumulative 441-460 4 450.5 0.14 19 
Class f Midpoint | frequency | frequency 
iat r mi ange 4 461-480 5 470.5 0.18 24 
: 481-500 4 490.5 0.14 28 
47-51 11 49 0.2444 15 
52-56 14 54 0.3111 29 Sf = 28 2(4) _ 
57-61 9 59 0.2000 38 a 
62-66 64 0.0889 42 
Weights of Polar Bears 
67-71 3 69 0.0667 45 4 
2 oat 
3 
Sf = 45 ei gy y 
n > 02+ 
Bolt 
Ages of U.S. Presidents % 00 : : ; 1 1 > 
at Inauguration 410.5 430.5 450.5 470.5 490.5 
Weight (in Kilograms) 
>, 15 
5 10 Class with greatest relative frequency: 401-420 
=] 
s , Classes with least relative frequency: 441-460 and 481-500 
sa 


Age (in years) 


Sample answer: The graph shows that the number of 
US. presidents who were 52 or older at inauguration was 
twice as many as those who were 51 and younger. 


41. 


Retirement Ages 


Relative frequency 
8 
, , , t 1 , + 


0-“\¢—t—_+—_1—_ +--+ 
50 55 60 65 70 75 80 85 
Age (in years) 


{> 


Frequency, Relative | Cumulative 
Class Sf Midpoint | frequency | frequency 
49-54 4 S15 0.11 4 
55-60 12 S75 0.34 16 
61-66 8 63.5 0.23 24 
67-72 6 69.5 0.17 30 
73-718 3 Da 0.09 33 
79-84 2 81.5 0.06 35 
=f = 35 > (4) =] 
n 


Location of the greatest increase in frequency: 55-60 


43. (a) 
Frequency, Relative | Cumulative 
Class tf Midpoint | frequency | frequency 
2-6 S) 4 0.2083 5 
7-11 10 9 0.4167 15 
12-16 6 14 0.2500 21 
17-21 2 19 0.0833 23 
22-26 0 24 0.0000 23 
27-31 1 29 0.0417 24 
Df = 24 2(4) 4 
n 
OM gas (0) . 
oad Accidents Road Accidents 
A 
5,10 10+ 
a6 3 ot+ 
E 44 £ 44 
2+ 2+ 
0S T T T T T es o+ t+—}—_+4 t > 
4 9 14 19 24 29 0 4 9 14 19 24 29 34 
Road accident Road accident 
d 
( ) Road Accidents 2) Road Accidents 
s 045+ ey A 
get eal 
3 0.30-+ 3 20+ 
2 Ree ‘O 15+ 
2 vist = at 
a 0.10 -- & 10> 
% 9:00 lye i, B s+ 
1S) 


T Tt 
4 9 14 19 24 29 
Road accident 


o 


St 
0 4 9 14 19 24 29 
Road accident 


45. 


47. 


Section 2.2 


1. 


3. 


5. 
9. 


ODD ANSWERS A45 
(a) Daily Withdrawals 
z 0.35 -- 
5 
2 0.30 -- 
3 0.25 7- 
0.20 
2 015+ 
010+ 
2 0.05 =- 
“waananunnn 
aAaunntFeanaw 
woornrtan S 


Amount (in hundreds 
of dollars) 


(b) 16.7%, because the sum of the relative frequencies for 
the last three classes is 0.167 

(c) $9700, because the sum of the relative frequencies for 
the last two classes is 0.10. 


Histogram (5 Classes) Histogram (10 Classes) 
A A 
8+ 6+ 
7 
5 
Bo = 
Sst a4 
3 3 
& 47 & 3 
2 au a 
me? m 2+ 
ot 
i+ a8 
T T T T T > 1 a Se a + 
2 5 8 Il 4 15 5.5 9.5 13.5 175 
Data value Data value 
Histogram (20 Classes) 
A 
s+ 
p> tr 
iS) 
5 
B37 
a 
2 2+ 
isa 
14 


135 7 91113151719 
Data value 

In general, a greater number of classes better preserves 
the actual values of the data set but is not as helpful 
for observing general trends and making conclusions. In 
choosing the number of classes, an important consideration 
is the size of the data set. For instance, you would not want 
to use 20 classes if your data set contained 20 entries. In 
this particular example, as the number of classes increases, 
the histogram shows more fluctuation. The histograms with 
10 and 20 classes have classes with zero frequencies. Not 
much is gained by using more than five classes. Therefore, it 
appears that five classes would be best. 


(page 84) 


Quantitative: stem-and-leaf plot, dot plot, histogram, scatter 
plot, time series chart 

Qualitative: pie chart, Pareto chart 

Both the stem-and-leaf plot and the dot plot allow you to 
see how data are distributed, to determine specific data 
entries, and to identify unusual data values. 

b 6. d 7a 8. c 

27, 32, 41, 43, 43, 44, 47, 47, 48, 50, 51, 51, 52, 53, 53, 53, 54, 54, 
54, 54, 55, 56, 56, 58, 59, 68, 68, 68, 73, 78, 78, 85 

Max: 85; Min: 27 


A46 


ODD ANSWERS 


11. 13, 13, 14, 14, 14, 15, 15, 15, 15, 15, 16, 17 17, 18, 19 


13. 


15. 


17. 


19. 


21. 


23. 


25. 


Max: 19; Min: 13 

Sample answer: Facebook has the most users, and Pinterest 
has the least. Tumblr and Instagram have about the same 
number of users. 

Sample answer: The Texter is the least popular driver. The 
Left-Lane Hog is tolerated more than the Tailgater. The 
Speedster and the Drifter have the same popularity. 
Humidity (in percentages) 


18 | 6 Key: 18|6 = 18.6 
19}24669 
20/15888 
211034568 

22568 


Sample answer: Most of the days had a humidity level of 
19.9% to 21.8%. 

Runs scored 
7/;00133557889 
8}/125678 
910011489 

Sample answer: Most runs scored by the batsman were in 
the 70s. 


Key: 7|0 = 70 


Incomes (in millions) of Highest Paid Tech CEOs 
1;334445567 Key: 1/3 = 13T 
89999 
21000022555 
8 8 
3) i2. 353 
4/11 


Sample Answer: Most of the highest-paid tech CEOs have 
an income of $13 million to $22 million. 


> 


< t 
0.102 


° ° ° . 
eoe ° eee e eoee ee eee 
FS sO Os cel scm of ct 9B em sO HHH} 
CITT TTT Terre tr tr rt T 1 


t t Ht | 
62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 1 
Glucose level (in milligrams per deciliter) 


° 
| 
0 


Sample answer: Blood glucose level tends to be between 94 
and 101 milligrams per deciliter. 


Student Loan Borrowers by Balance 
Owed in Fourth Quarter 2015 


$50,001 + 
15.2% 


$10,001 to $25,000 


$25,001 to $50,000 
28.1% 


18.8% 
Sample answer: The majority of student loan borrowers owe 
$25,000 or less. 


27. 


29. 


31. 


33. 


FIFA World Cup 


Number of times won 


Brazil 5 


or N 
1 
tT 
Argentina 
Y 


Italy 5 


Uruguay ~ 
Germany — 


Country 
Sample answer: Brazil won the FIFA World Cup the most 
number of times out of the five countries and Uruguay and 
Argentina won the least number of times. 


Hourly Fees 
A 

7+ 
a ~ 16+ ®e s 
ow e e 
Qe 6+ ° 
>= wat e 
eae) 
BU pt e 
os 
mo 12 +9¢ 

e 
te 


t—+—__}+-—_ + +-—_ +--+ 
30 35 40 45 50 55 60 65 
Number of hours 


Sample answer: It appears that there is a slight positive 


relation between fees and hours of coaching. 


— Engineering Degrees 
az 
ae 

=I 

‘5 8 110+ 

S 3 105 

= s 100+ 

ge ST 
90-7 

% 30 85 -- 

28 ost 
Be App 
5 DRnonrananwtn 

a nl ne el 


Sample answer: The number of bachelor’s degrees in 
engineering conferred in the U.S. has increased from 2008 
to 2015. 

Heights (in inches) 


7/224 Key: 7|2 = 72 
7/5555 6 

€|4 12924 

8 


The dot plot helps you see that the data are clustered from 
72 to 76 and 81 to 84, with 75 being the most frequent value. 
The stem-and-leaf plot helps you see that most values are 
75 or greater. 


35. 


37. 


39. 


41. 


Favorite Season of U.S. 
Adults Ages 18 and Older 


Percent 
Ss 
o 
1 


Fall- 


Winter 
¥ 


Spring- 
Summer- 


n 
fo) 
is) 
wn 
3° 
5 


The pie chart helps you to see the percentages as parts of a 

whole, with fall being the largest. It also shows that while fall 

is the largest percentage, it makes up less than half of the pie 
chart. That means that a majority of U.S. adults ages 18 and 
older prefer a season other than fall. This means it would 

not be a fair statement to say that most U.S. adults ages 18 

and older prefer fall. The Pareto chart helps you to see the 

rankings of the seasons. 

(a) The graph is misleading because the large gap from 0 to 
90 makes it appear that the sales for the 3rd quarter are 
disproportionately larger than the other quarters. 

(b) Bales for Company A 

130 +- 


110 -- 


Sales 
(in thousands of dollars) 
3 
t 


> 


3rd 2nd Ist 4th 
Quarter 
(a) The graph is misleading because the angle makes it 
appear as though the 3rd quarter had a larger percent 
of sales than the others, when the 1st and 3rd quarters 
have the same percent. 


(b) Sales for Company B 
4th 

quarter 

20% 


Ist 
quarter 
38% 


3rd 2nd 

rad Sag 

(a) At Law Firm A, the lowest salary was $90,000 and the 
highest salary was $203,000. At Law Firm B, the lowest 
salary was $90,000 and the highest salary was $190,000. 
There are 30 lawyers at Law Firm A and 32 lawyers at 
Law Firm B. 

(b) At Law Firm A, the salaries are clustered at the far ends 
of the distribution range. At Law Firm B, the salaries are 
spread out. 


43. 


Section 2.3 


ODD ANSWERS A47 
(a) 2|6 Key: 2|6 = 26 
3) 1 
4/04456799 
5/555 
6/344 
710122 


wn 
Ui eee 


Frequency 
Ne - a oo 


nnn na 
DAARDAAD 
a +rTrnN OR 


eee 
ana a 
aN 


Sample answer: The stem-and-leaf plot, dot plot, frequency 
histogram, and ogive display the data best because the data 
is quantitative. 


(page 96) 
1. True 3. True 5. Sample answer: 1,2,2,2,3 
7. Sample answer: 2,5, 7,9, 35 


9. 


11. 


13. 


14. 


15. 


16. 


17. 
19. 


21. 
23. 
25. 


The shape of the distribution is skewed right because the 
bars have a “tail” to the right. 

The shape of the distribution is uniform because the bars are 
approximately the same height. 

(11), because the distribution of values ranges from 1 to 12 
and has (approximately) equal frequencies. 

(9), because the distribution has values in the thousands 
and is skewed right due to the few vehicles that have much 
higher mileages than the majority of the vehicles. 

(12), because the distribution has a maximum value of 90 
and is skewed left due to a few students scoring much lower 
than the majority of the students. 

(10), because the distribution is approximately symmetric 
and the weights range from 80 to 160 pounds. 

x ~ 24.8; median = 24.5;mode = 24, 26 

x ~ 3752; median = 34; mode = 28; The mode does not 
represent the center of the data set because they are small 
values compared to the rest of the data. 


x =~ 49.8; median = 50.5;mode = 51 
x ~ 74;median = 6;mode = 6 
x = 14.3; median = 9; mode = none; The mode cannot be 


found because no data entry is repeated. The mean does not 
represent the center of the data set because it is influenced 
by the outlier of 42. 


A48 


27. 


29. 


31. 
33. 
35. 
37. 


39. 
41. 


49, 
53. 


55. 


57. 


ODD ANSWERS 


X is not possible; median is not possible; mode = “Search 
and buy online”; The mean and median cannot be found 
because the data are at the nominal level of measurement. 
X is not possible; median is not possible; mode = “Junior”; 
The mean and median cannot be found because the data are 
at the nominal level of measurement. 

x ~ 29.2;median = 30.5;mode = 23, 34 

x ~ 19.5;median = 20;mode = 15 

Cluster around 275-425 

Mode, because the data are at the nominal level of 
measurement. 

Mean, because the distribution is symmetric and there are 
no outliers. 


79 43. $612.73 45. 84 47. 81.5 
26.178 kilometers per hour 51. 3702 years old 
Class Midpoint, x | Frequency, f 
127-161 7 144 
162-196 6 179 
197-231 3 214 
232-266 3 249 
267-301 1 284 
Hospital Beds 
2 6 
35+ 
3 44 
xz 3 
24 
= | _ 
144 179 214 249 284 ~ 
Number of beds 
Positively skewed 
Class | Midpoint, x | Frequency, f 
40-45 42.5 7 
46-51 48.5 10 
52-57 54.5 
58-63 60.5 
64-69 66.5 
Weights of Females 
w+ 
o- 
= 87 
§ £0 
3. $1 
2 at 
BR 34 
Poe 
i 
0 = 


a SES RE: 
42.5 48.5 54.5 60.5 66.5 
Weight (to the nearest kilogram) 


Positively skewed 

(a) ¥ ~ 20,median = 20.3 
(b) x ~ 19.875, median = 20.1 
(c) Median 


59. 


61. 


63. 


65. 


Section 2.3 Activity 


The data are skewed right. 
A = mode, because it is the data entry that occurred most 
often. 
B = median, because the median is to the left of the mean 
in a skewed right distribution. 
C = mean, because the mean is to the right of the median 
in a skewed right distribution. 
Increase one of the three-credit B classes to an A. The 
three-credit class is weighted more than the two-credit 
classes, so it will have a greater effect on the grade point 
average. 
(a) Mean, because Car A has the highest mean of the three. 
(b) Median, because Car B has the highest median of 
the three. 
(c) Mode, because Car C has the highest mode of the three. 
(a) x ~ 49.2; median = 46.5 
(b) Test Scores 
1)13  Key:3|6 = 36 
28 
6667778 
13 4 6 J—— mean 
1113 
1234 
2246 
5 
910 
(c) Positively skewed 


median 


CerADN BW ND 


(page 103) 


1. The distribution is symmetric. The mean and median both 


Section 2.4 


decrease slightly. Over time, the median will decrease 
dramatically and the mean will also decrease, but to a lesser 
degree. 


. Neither the mean nor the median can be any of the points 


that were plotted. Because there are 10 points in each 
region, the mean will fall somewhere between the two 
regions. By the same logic, the median will be the average 
of the greatest point between 0 and 0.75 and the least point 
between 20 and 25. 


(page 115) 


1. The range is the difference between the maximum and 


minimum values of a data set. The advantage of the range 
is that it is easy to calculate. The disadvantage is that it uses 
only two entries from the data set. 


. The units of variance are squared. Its units are meaningless 


(example: dollars”). The units of standard deviation are the 
same as the data. 


. When calculating the population standard deviation, you 


divide the sum of the squared deviations by N, then take 
the square root of that value. When calculating the sample 
standard deviation, you divide the sum of the squared 
deviations by n — 1, then take the square root of that value. 


19. 


21. 


23. 


yw WY 
Ne ne 


. Approximately 35, or $35,000 
. Range = 13; ~ 5.83;0°7 ~ 16.36;0 ~ 4.04 

. Range = 8;x = 18;s? ~ 4.7;5 ~ 2.2 

. The data set in (a) has a standard deviation of 2.4 and the 


. Similarity: Both estimate proportions of the data contained 


within k standard deviations of the mean. 

Difference: The Empirical Rule assumes the distribution is 
approximately symmetric and bell-shaped and Chebychev’s 
Theorem makes no such assumption. 


11. (a) 678 (b) 130.8 


data set in (b) has a standard deviation of 5 because the data 
in (b) have more variability. 
Company A; An offer of $36,000 is within two standard 
deviations from the mean of Company A’s starting salaries, 
which makes it likely. The same offer is three standard 
deviations of the mean of Company B’s starting salaries, 
which makes the offer unlikely. 
(a) Greatest sample standard deviation: (ii) 
Data set (ii) has more entries that are farther away from 
the mean. 
Least sample standard deviation: (iii) 
Data set (iii) has more entries that are close to the mean. 
(b) The three data sets have the same mean, median, and 
mode, but have a different standard deviation. 
(c) Estimates will vary; (i) s ~ 1.1; (ii) s ~ 1.3;(iii)s ~ 0.8 
(a) Greatest sample standard deviation: (i) 
Data set (i) has more entries that are farther away from 
the mean. 
Least sample standard deviation: (iii) 
Data set (iii) has more entries that are close to the mean. 
(b) The three data sets have the same mean, median, and 
mode, but have a different standard deviation. 
(c) Estimates will vary; (i) s ~ 9.6; (ii) s ~ 9.0; (ili) s ~ 5.1 


. Sample answer: 3, 3,3,7,7,7 
. Sample answer: 12, 12,12, 12, 12 
. 68% 


31. (a) 51. (b) 17 


33. 78,76, and 82 are unusual; 82 is very unusual because it is 
more than 3 standard deviations from the mean. 
35. 53 
37. At least 75% of the student heights are from 117 to 133 
centimeters. 
39. 
ei Ff suf x — x|(x — x)? (x -— x} 
0 10 0 —3.08| 9.4864 94.864 
1 12 12 —2.08| 4.3264 51.9168 
2 9 18 —1.08} 1.1664 10.4976 
3 2 6 —0.08 | 0.0064 0.0128 
4 2 0.92} 0.8464 1.6928 
5 3 15 1.92] 3.6864 11.0592 
6 1 6 2.92} 8.5264 8.5264 
7 4 28 3.92 | 15.3664 61.4656 
8 2 16 4.92 | 24.2064 48.4128 
9 5 45 5.92 | 35.0464 175.232 
n = 50| xf = 154 Y(x — xP f = 463.38 


xX =~ 3.08, 5s ~ 3.076 


ODD ANSWERS A49 
41. 

Class 5 if Sif 
15,000-17,499 | 16,249.5 9 146,245.5 
17,500-19,999 | 18,749.5 10 187,495 
20,000-22,499 | 21,249.5 16 339,992 
22,500-24,999 | 23,749.5 11 261,244.5 
25,000 or more | 26,249.5 6 157,497 

n = 52 | Sxf = 1,092,474 

ay ee (x -— x)? (x -— x) 
—4759.62 | 22,653,982.54 | 203,885,842.9 
—2259.62 | 5,105,882.54 51,058,825.4 
240.38 57,782.54 924,520.64 
2740.38 | 7,509,682.54 | 82,606,507.94 
5240.38 | 27,461,582.54 | 164,769,495.20 
d(x — x)? f = 503,245,192.1 


xX ~ $21,009.12; 5 ~ $3141.27 


43. 
x f af x—x| (x -— x) | (x - xf 
1 2 2 =L9 3.61 7.22 
2 18 36 —0.9 0.81 14.58 
3 24 72 0.1 0.01 0.24 
4 16 64 1.1 1.21 19.36 
n= 60 | Sxf = 174 x(x — x)°f = 414 


xX ~ 2.9;5 ~ 0.8 
45. CVpenver ~ 9.7%, CVia ~ 8.8% 
Salaries for entry level architects are more variable in 
Denver than in Los Angeles. 
47. CVages ~ 13.3%, CV heights ~ 3.5% 
Ages are more variable than heights for all members of the 
2016 Women’s U.S. Olympic swimming team. 
49, CVinates ~ 6.81%, CViemates ~ 3.84% 
Weights are more variable for males than for females. 
51. (a) Answers will vary. (b) s ~ 2.2 
(c) They are the same. 
53. (a) X ~ 42.1;5 ~ 5.6 (b) X ~ 443;5 =~ 5.9 
(c) 3.5, 3,3, 4, 4, 2.75, 4.25, 3.25, 3.25, 3.5, 3.25, 3.75, 3.5, 4.17 
xX ~ 3.5;s ~ 0.47 
(d) When each entry is multiplied by a constant k, the new 
sample mean is k-x, and the new sample standard 
deviation is k-s. 
55. (a) P ~ —2.61 
The data are skewed left. 


(b) P = 4.12 

The data are skewed right. 
(c) P=0 

The data are symmetric. 
(d) P=1 

The data are skewed right. 
(e) P= —-3 


The data are skewed left. 


A50 
Section 2.4 Activity 


1. 


Section 2.5 


1. 


3. 


ODD ANSWERS 


(page 122) 


When a point with a value of 15 is added, the mean remains 
constant or changes very little, and the standard deviation 
decreases. When a point with a value of 20 is added, the 
mean is raised and the standard deviation increases. 


. To get the largest standard deviation, plot four of the points 


at 30 and four of the points at 40. 

To get the smallest standard deviation, plot all of the points 
at the same number. That way, each x — X is 0, so the 
standard deviation will be 0. 


(page 131) 


The talk is longer in length than 75% of the lectures in the 
series. 

The student scored higher than 89% of the students who 
took the Fundamentals of Engineering exam. 

The interquartile range of a data set can be used to 
identify outliers because data entries that are greater than 
Q3 + 1.5(IQR) or less than Q; — 1.5(IOR) are considered 
outliers. 


7. True 
9. False; An outlier is any number above Q3 + 1.5(IQR) or 

below Q, — 1.5(IOR). 
11. (a) QO, = 41, Q) = 46,0; =48 (b) IOR=7 (c) 65 
13. Min = 0, O; = 2, Q) = 5, Q; = 8, Max = 10 
15. (a) Min = 45, QO, = 52, Q) = 49, Q; = 65, Max = 79 

(b) 

45 52 49 65 79 
= A 5 a a 

17. (a) Min = 1, Q; = 4.5, Q) = 6, Q3 = 7.5, Max = 9 


19. 
21. 
23. 


25. 


27. 
29 


. 


(b) 


1 45 6 75 9 


1 1 
T T 
O12 34 5 6 7 38 9 


~< 1 


> 


None. The data are not skewed or symmetric. 
Skewed left. Most of the data lie to the right on the box plot. 


Extra Classes 


~< 


t +H 
12 3 4 
Number of extra classes 


Hours Worked in a Month 


o——_9—_9__9—___—_® 


131 153.5 171.5 195.25 238 


Br ‘i 1m an ahi i aa. 
Number of hours 
(a) 2hours (b) 50% (c) 75% 
About 158; About 70% of quantitative reasoning scores on 
the Graduate Record Examination are less than 158. 


31. 


33. 
35. 


37. 


39. 
41. 


43. 


45. 


47. 


49. 


51. 


53. 


55. 
57. 


About 8th percentile; About 8% of quantitative reasoning 
scores on the Graduate Record Examination are less that 140. 
40th percentile 

28, 35, 38, 40, 41, 41, 42, 43, 43, 43, 44, 45, 47, 47, 48, 50, 50, 50, 
53, 54, and 54. 


Depatment of Motor 
Vehicles Wait Times 
100 + 
90 4 
80-4 
2 704 
= 604 
S 50-4 
2 404 
304 
204 
10-4 
a lO la gL 
4° 8 12 16 20 24 28 
Time (in minutes) 
About 85th percentile 


A->z = -1.43 
B->z=0 
C—>z=2.14 


A z-score of 2.14 would be unusual. 
Not unusual; The z-score is 0.94, so the age of 31 is about 
0.94 standard deviation above the mean. 
Not unusual; The z-score is —0.27, so the age of 27 is about 
0.27 standard deviation below the mean. 
Unusual; The z-score is —2.39, so the age of 20 is about 
2.39 standard deviations below the mean. 
(a) For 1250, z = 2.5; For 1175, z = 1.75; 
For 950, z = —0.5 
The lady bug with a life span of 1250 days has an 
unusually long life span. 
(b) For 1150, about 93rd percentile; 
For 910, about 18th percentile; 
For 845, about 6th percentile 
Robert Duvall: z =~ 1.07; Jack Nicholson: z ~ —0.32; 
The age of Robert Duvall was about 1 standard deviation 
above the mean age of Best Actor winners, and the age of 
Jack Nicholson was less than 1 standard deviation below the 
mean age of Best Supporting Actor winners. Neither actor’s 
age is unusual. 
John Wayne: z ~ 2.10; Gig Young: z ~ 0.41; The age of 
John Wayne was more than 2 standard deviations above the 
mean age of Best Actor winners, which is unusual. The age 
of Gig Young was less than 1 standard deviation above the 
mean age of Best Supporting Actor winners, which is not 
unusual. 
5 
(a) The distribution of Concert 1 is symmetric. The 
distribution of Concert 2 is skewed right. Concert 1 has 
less variation. 
(b) Concert 2 is more likely to have outliers because it has 
more variation. 
(c) Concert 1, because 68% of the data should be between 
+16.3 of the mean. 
(d) No, you do not know the number of songs played at 
either concert or the actual lengths of the songs. 


59. (a) 24,2 (b) 
; A ™ 


279 11 1316 24 


> 


1 1 1 
T T T 
0 S 10 15 20 25 30 


61. (a) 1 (b) 


1 23 32 46 52.5 83 


0 10 20 30 40 50 60 70 80 90 


63. Answers will vary. 


Uses and Abuses for Chapter 2 — (page 136) 


1. Answers will vary. 

2. No, it is not ethical because it misleads the consumer 
to believe that drinking red wine is more effective at 
preventing heart disease than it may actually be. 


Review Exercises for Chapter 2 (page 138) 


: Class 
Class | Midpoint | boundaries 
26-31 28.5 25.5-31.5 
32-37 34.5 31.5-37.5 
38-43 40.5 37.5—43.5 
44-49 46.5 43.5-49.5 
50-55 52.5 49.5-55.5 


Frequency, | Relative | Cumulative 
f frequency | frequency 
5 0.25 5 
4 0.20 9 
6 0.30 15 
3 0.15 18 
2 0.10 20 

sf=2 | shat 


3. Liquid Volume 12-0z Cans 
A 


8 


a 
1 
T 


Frequency 
- 
t 


eae ae AS 


Actual volume (in ounces) 


a 


11. 


13. 
15. 
23. 


25. 
27. 


ODD ANSWERS A51 


Class Midpoint | Frequency, f 
79-93 86 9 
94-108 101 12 
109-123 116 5 
124-138 131 4 
139-153 146 2 
154-168 161 1 
Xf = 33 


Rooms Reserved 


Frequency 


Number of rooms 


Pollution Indices of U.S. Cities 

2388 Key: 2|2 = 22 

2368899 

11369 

03467 

3.55 

Sample answer: Most U.S. cities have a pollution index from 
32 to S57. 


College Students’ 
Activities and Time Use 


NN FWND 


Sleeping 
36.67% 
Other 
22.50% 

. Leisure 
Educational and sports 
activities ‘i 9 
nae Working 16.67% 


9.58% 


Sample answer: Full-time university and college students 
spend the least amount of time working. 


Heights of Buildings 
A 
110 -- 
& 100+ e . 
& e 
Z 9+ 
a 
Z 
7 ” o° 
2 70+ 8 
= e 
5 60 e 
4 soe 
1000 1200 1400 1600 1800 
Height (in feet) 
Sample answer: The number of stories appears to increase 
with height. 


x = 29.5; median = 29.5; mode = 29.5 

82.45 17. 38.4 19. Skewed right 21. Skewed right 
Mean; When a distribution is skewed right, the mean is to 
the right of the median. 

Range = 8; u ~ 81; 0° ~ 5.92;0 ~ 2.43 

Range = $1262; x ~ $1106.27; 5? ~ 182714.35; 

s = $427.45 


A52 


29. 
33. 
35. 


37. 
39. 


41 
45 


47. 


ODD ANSWERS 


$13 and $23 31. 15 students 
x ~ 25;5 ~ 12 


CVa =~ 28.96%; CVg ~ 20.2% 

Dividends are more variable for Company A than Company B. 
Min = 16, Q; = 25, Q) = 35, Q3 = 56, Max = 136 

Model 2017 Vehicle Fuel Economies 


16 25 35 56 136 


15 30 45 60 75 90 105 120 135 150 
Fuel economy (in miles per gallon) 


7 inches 43. 85% 

Not unusual; The z-score is 1.97 so a towing capacity of 
16,500 pounds is about 1.97 standard deviations above the 
mean. 

Unusual; The z-score is 2.60, so a towing capacity of 
18,000 pounds is about 2.60 standard deviations above the 
mean. 


Quiz for Chapter 2 (page 142) 
1. (a) Aes p 
Class Midpoint | Class boundaries 
101-112 106.5 100.5—112.5 
113-124 118.5 112.5-124.5 
125-136 130.5 124.5-136.5 
137-148 142.5 136.5-148.5 
149-160 154.5 148.5-160.5 
Relative Cumulative 
Frequency, f | frequency frequency 
3 0.11 3 
11 0.41 14 
8 0.30 22 
0.11 25 
2 0.07 27 
(b) Weekly Exercise 
BS 
a 
a) 
Number of minutes 
(c) Weekly Exercise 


> 


Relative frequency 
ooococse 
oF N WB 
aan kN SO 
betty iit 


H 1 
rrrrrrrreert 
1055 


118.5 
130.5 
142.5 
154.5 


Number of minutes 


(d) Skewed right 


(e) Weekly Exercise 


NN w 
Sou Ss 


= 
o 


Cumulative frequency 
wn a 


nnn nH 4 
a oes oS 
sfsaau ee 


Number of minutes 
(f) Weekly Exercise (in minutes) 
10|18  Key:10|8 = 108 
11;1467899 
12};00334778 
13;0112599 


14 |2 
15107 

(g) Weekly Exercise 
101 118 124 132 157 


100 110 120 130 140 150 160 
Number of minutes 


2. x =~ 126.1;5 = 13.0 
3. (a) 


Elements with Known 
Properties 


Other 
nonmetals 
6.3% 


Rare earth 
elements 
26.8% 


Noble gases 
5.4% 


Metalloids 
Halogens 6.3% 
4.5% 


(b) Elements with Known 
Properties 


t 

a 

o 
a 


wow hw 
oo 6 
it 


eH 

os 
a 
T_T 


Number of elements 


Other | 
Y 


nonmetals 


Metals ~ 
Noble gases 


Rare earth 
elements 
Metalloids 5 
Halogens 


Element 


. (a) x ~ 1016.4; median = 1019; mode = 1100; The mean 


or median best describes a typical salary because there 
are no outliers. 

(b) Range = 666; s* ~ 47,120.9;s ~ 217.1 

(c) CV = 214% 


5. $150,000 and $210,000 


Real Statistics—Real Decisions for Chapter 2 


6. (a) Unusual; The z-score is 3, so a new home price of 


$225,000 is about 3 standard deviations above the mean. 
(b) Unusual; The z-score is —6.67, so a new home price of 
$80,000 is about 6.67 standard deviations below the mean. 
(c) Not unusual; The z-score is 1.33, so a new home price 
of $200,000 is about 1.33 standard deviations above the 
mean. 
(d) Unusual; The z-score is —2.2, so a new home price of 
$147,000 is about 2.2 standard deviations below the mean. 


Wins for Each MLB Team 
e a ad 
59 71 82.5 89 103 


i i i i i i i i i i i 
+—_+—_+—_—_ +_+_ +#_ ++ + + 
55 60 65 70 75 80 85 90 95 100 105 
Number of wins 


% = 


(page 144) 


1. (a) Find the average cost of renting an apartment for each 


area and do a comparison. 
(b) The mean would best represent the data sets for the 
four areas of the city. 
(c) Area A: ¥ = $1131.58 
Area B: ¥ = $998.33 
Area C: ¥ = $991.58 
Area D: ¥ = $1064.17 


2. (a) Construct a Pareto chart, because the data are 


quantitative and a Pareto chart positions data in order 
of decreasing height, with the tallest bar positioned at 
the left. 


(b) Cost of Monthly 
Rent per Area 


PeH 
an 
so 
os 
n 
T 


Mean monthly 
rent (in dollars) 


Area A4 
> Area D4 


(c) Yes. From the Pareto chart, you can see that Area A has 
the highest average cost of monthly rent, followed by 
Area D, Area B, and Area C. 


3. Sample answer: 


(a) You could use the range and sample standard deviation 
for each area. 


(b) AreaA Area B 
range = $467 range = $474 
s = $138.45 s = $163.11 
Area C Area D 
range = $518 range = $560 
s =~ $164.51 s ~ $156.26 


(c) No. Area A has the lowest range and standard deviation, 
so the rents in Areas B—D are more spread out. There 
could be one or two inexpensive rents that lower the 
means for these areas. It is possible that the population 
means of Areas B-D are close to the populations mean 
of Area A. 


4. 


Cumulative Review for Chapters 1-2 
1. 


ODD ANSWERS A53 


(a) Answers will vary. 
(b) Location, weather, population 


(page 148) 


Systematic sampling is used because every fortieth 
toothbrush from each assembly line is tested. It is possible 
for bias to enter into the sample if, for some reason, an 
assembly line makes a consistent error. 


. Simple random sampling is used because each telephone 


number has an equal chance of being dialed, and all 
samples of 1090 phone numbers have an equal chance 
of being selected. The sample may be biased because 
telephone sampling only samples those individuals who 
have telephones, who are available, and who are willing to 


respond. 
Workplace Fraud 
40 ++ 
35+ 
307 
§ 25+ 
5 20+ 
15+ 
10+ 
iL | 
le 
ose SQ 68 
Egat 8 sees 
Bee 2 ees 
6 ES 7 8a 


Fraud detection 


. Parameter. The median salary is based on all marketing 


account executives. 


5. Statistic. The percent, 88%, is based on a subset of the 
population. 
6. (a) 95% 
(b) For $93,500, z ~ 4.67; For $85,600, z ~ —0.6; For 
$82,750, z = —2.5. 
The salaries of $93,500 and $82,750 are unusual. 
7. Population: Collection of opinions of all college and 


S) 


university admissions directors and enrollment officers. 
Sample: Collection of opinions of the 339 college and 
university admission directors and enrollment officers 
surveyed. 


. Population: Reasons for pain reliever use of all Americans 


ages 12 or older 
Sample: Reasons for pain reliever use of the 67901 
Americans ages 12 or older surveyed 


. Experiment. The study applies a treatment (digital device) 


to the subjects. 

Observational study. The study does not attempt to influence 
the responses of the subjects. 

Quantitative; Ratio 

Qualitative; Nominal 


(a) Tornadoes by State 
A o- o 
0 2 10 32 99 
erp a ee eae 


tt oh 
0 10 20 30 40 50 60 70 80 90 100 
Number of tornadoes 


(b) Skewed right 


AS54 ODD ANSWERS 
14. 88.9 
15. (a) x ~ 5.49; median = 5.4, mode = none; Both the 


16. 


17. 


18. 
19. 


mean and the median accurately describe a typical 
American alligator tail length. 

(b) Range = 4.1; 8” ~ 2.34; ~ 1.53 

(a) An inference drawn from the study is that the life 
expectancies for Americans will continue to increase or 
remain stable. 

(b) This inference may incorrectly imply that Americans 
will have higher life expectancies in the future. 


Class 
Class | Midpoint | boundaries 
0-8 4 —0.5-8.5 
9-17 13 8.5-17.5 
18-26 22 17.5-26.5 
27-35 31 26.5-35.5 
36-44 40 35.5-44.5 
45-53 49 44,5-53.5 
54-62 58 53.5-62.5 
63-71 67 62.5-71.5 
Relative | Cumulative 
Frequency, f | frequency | frequency 
20 0.500 20 
7 0.175 27 
6 0.150 33 
1 0.025 34 
2 0.050 36 
1 0.025 37 
2 0.050 39 
1 0.025 40 
Skewed right 
Montreal Canadiens 
Points Scored 
2 0.50 t 
2 0.40 -- 
& 030-4 
2 0.20 + 
3 0.10 + : 


T T T T T T T T 
TRNAS SS RS 
Number of points 


scored (per player) 
Class with greatest frequency: 0-8 
Classes with least frequency: 27-35, 45-53, and 63-71 


Chapter 3 


Section 3.1 
1. 


3. 


(page 162) 


An outcome is the result of a single trial in a probability 
experiment, whereas an event is a set of one or more 
outcomes. 

The probability of an event cannot exceed 100%. 


29. 


31. 


33. 


35. 


37. 
47. 
53. 


55. 


57. 


59. 
65. 
67. 
69. 
79. 
81. 


ad af 
. 3:19. 0.97 
» (A,B, C.D, EEG, 11S, K, L:M,N, GO, BO, RS, 7, U,V, 


. The law of large numbers states that as an experiment 


is repeated over and over, the probabilities found in the 
experiment will approach the actual probabilities of the 
event. Examples will vary. 


. False. The event “choosing false on a true or false question 


and choosing A or B on a multiple choice question” is not 
simple because it consists of two possible outcomes and can 
be represented as A = {FA, FB}. 


. False. A probability of less than 35 = 0.05 indicates an 


unusual event. 
13. b 14. c 15. a 16. e 
21. 0.05 23. 4 


W, X, Y, Z}; 26 


. {Ay, Ky, Ov, Jv, 10¥, 9¥, 8¥, 74, 64,54, 44, 34,24, 


A, Ke, 04,34, 1040,96,86,74,64,54,44,34,2¢, 
Aa, Ka, 04,534, 104,94, 84,74, 64,54, 44, 34,24, 
Am, Ke, Qm, Jom, 10cm, 9m, Soe, 7m, Ome, Soe, 4m, 3am, 2H}; 
52 


i 
H 

— a 

-— 
an 

— a 


{HH, HT, TH, TT}; 4 


| 
| | | | 
1 2 3 4 5 6 
2 eee Pee epee 
epee ps 1HeaBwaeG 
CTTTT I ie | 
Hea eG Hea eae 
£(1, 1). {152)5 (1,3), (1, 4)3 (55); 1;6)5.(2,), (252); 
(2,3)5(2.4)5 (255)5 (2,6) 4 (3,1), (G52) 9 (3.3), G4): 
(339)5 (3,6) (41) (Ae) (4S) aa eas 148) 
(Ds Oy a2 Op 8)e (Sis (9,8 )5 (G3 a (a2), 
(6,3), (6,4); (6; 5); (6, 6) }; 36 


1; Simple event because it is an event that consists of a single 
outcome. 

13; Not a simple event because it is an event that consists of 
more than a single outcome. 


616 39. 24,000 41. 0.083 43. 0.17 45. 0.25 
0.24 49. 0.216 51. 0.344 
Classical probability because each outcome in the sample 


space is equally likely to occur. 

Empirical probability because survey results were used 
to calculate the frequency of a person watching a movie 
every day. 

Classical probability because each outcome in the sample 
space is equally likely to occur. 


0.842 61. 0.777 63. 0.042; unusual 

0.208; not unusual 

(a) 0.0015 = (b) 0.9985 

0.125 71. 0.375 73. 0.450 75. 0.033. 77. 0.275 
No; None of the events have a probability of 0.05 or less. 
(a) 0.5 (b) 0.25 (c) 0.25 83. 0.808 85. 0.103 


87. (a) 0.313 (b) 0.078 

(c) 0.031; This event is unusual because its probability is less 
than or equal to 0.05. 

89. The probability of randomly choosing a cricket player who 
did not play for school team 

91. No. The odds of winning a prize are 1 : 6 (one winning cap 
and six losing caps). So, the statement should read, “one in 
seven game pieces wins a prize.” 


93. (a) 0.444 (b) 0.556 = 95, 39:13 = 3:1 
97. (a) = 
Sum | Probability 
2 0.028 
3 0.056 
4 0.083 
5 0.111 
6 0.139 
7 0.167 
8 0.139 
9 0.111 
10 0.083 
11 0.056 
12 0.028 


(b) Answers will vary. 
(c) Answers will vary. 


Section 3.1 Activity 


1-2. Answers will vary. 


(page 168) 


Section 3.2 


1. Two events are independent when the occurrence of one of 
the events does not affect the probability of the occurrence 
of the other event, whereas two events are dependent 
when the occurrence of one of the events does affect the 
probability of the occurrence of the other event. 

3. The notation P(B|A) means the probability of event B 
occurring, given that event A has occurred. 

5. False. If two events are independent, then P(A|B) = P(A). 

7. (a) 0.526 (b) 0.159 

9. Dependent. The outcome of the first card drawn affects the 
outcome of the second card drawn. 

11. Dependent. The outcome of returning a movie after its due 
date affects the outcome of receiving a late fee. 

13. Independent. The outcome of returning a library book 
before its due date does not affect the outcome of issuing a 
new book. 

15. Events: having more cone cells, perceiving colors differently; 
Independent. Having more cone cells does not cause any 
difference in the perception of colors. 

17. Events: eating chocolates, improvement in cardiovascular 
health; Dependent. Eating chocolates causes improvement 
in cardiovascular health. 

19. 0.006 21. 0.001 

23. (a) 0.040  (b) 0.640 


(page 174) 


(c) 0.360 


ODD ANSWERS A55 


25. (a) 0.022 (b) 0.722 (c) 0.278 
(d) The event in part (a) is unusual because its probability 
is less than or equal to 0.05. 
27. (a) 0.011 (b) 0.022 (c) 0.978 
(d) The events in parts (a) and (b) are unusual because their 
probabilities are less than or equal to 0.05. 
29. (a) 0.007  (b) 0.589 
(c) Yes, this is unusual because the probability is less than 
or equal to 0.05. 


31. 0.32 33. 0.444 35. 0.167 37. 0.705 

39. (a) 0.074 (b) 0.999 41. 0.954 

Section 3.3 (page 184) 

1. P(A and B) = 0 because A and B cannot occur at the same 
time. 

3. True 


5. False. The probability that event A or event B will occur is 
P(AorB) = P(A) + P(B) — P(A and B). 
7. Not mutually exclusive. A presidential candidate can lose 
the popular vote and win the election. 
9. Mutually exclusive. A student cannot study for more than 5 
hours but less than 2 hours daily. 
11. Not mutually exclusive. A badminton player can be female 


and 25 years old. 
13. 0.833 15. 0.985 
17. (a) 0.33 (b) 0.5 (c) 0.833 
19. (a) 0.06 (b) 0.426 (c) 0.81 (d) 0.201 


21. (a) 0.780 (b) 0.410 (c) 0.590 (d) 0.400 

(a) 0.520 (b) 0.899 (c) 0.909 

25. (a) 0.589 (b) 0.762 (c) 0.461 (d) 0.922 

27. 0.63 

Ifevents A, B,and C are not mutually exclusive, then P(A and 
B and C) must be added because P(A) + P(B) + P(C) 
counts the intersection of all three events three times and 
—P(A and B) — P(A and C) — P(B and C) subtracts the 
intersection of all three events three times. So, if P(A and B 
and C) is not added at the end, then it will not be counted. 


~) 
2 


) 
& 


Section 3.3 Activity 


1. Answers will vary. 
2. The theoretical probability is 0.5, so the green line should be 
placed there. 


(page 188) 


Section 3.4 (page 196) 
1. The number of ordered arrangements of n objects taken r at 
a time. 


Sample answer: An example of a permutation is the number 
of seating arrangements of you and three of your friends. 
3. False. A permutation is an ordered arrangement of objects. 
5. True 7. 15,120 9. 56 11. 0.076 13. 0.462 
15. Permutation. The order of the 16 floats in line matters. 
17. Combination. The order does not matter because the 
position of one captain is the same as the other. 
19. 3,62,880 21. 720 23. 6,840 25. 96,909,120 


A56 ODD ANSWERS 
27. 360,360 29. 50,400 31. 4845 33. 2,118,760 
35. 6240 37. 785,910 39. 0.022 41. 0.005 
43. (a) 0.016 (b) 0.385 ~— 45. 0.0009 
47. 0.242 49. 0.000000001 
51. 0.166 53. 0.070 55. 0.933 57. 0.086 
59. 0.066 61. 0.001 
Uses and Abuses for Chapter 3 (page 200) 
1. (a) 0.000001 2. 0.001 3. 0.001 
Review Exercises for Chapter 3 (page 202) 
1. —-H 
H 
‘— il 
H 
-—-H 
aR 
‘— fll 
H 
--H 
H 
‘— ill 
ay 
-—-H 
‘ir 
‘— il 
7] -—-H 
H 
— al 
H 
-—-H 
‘At 
‘— fl 
al 
-—-H 
H 
— 
AP 
--H 
T 
— fl 


. 0.258 
. Independent. The outcomes of the first three rolls do not 


. 0.7 31. 0.654 
. No; You do not know whether events A and B are mutually 


. 110 43. 35 
. (a) 0.856 
. (a) 0.292 


{HHHH, HHHT, HHTH, HHTT, HTHH, HTHT, HTTH, 
HTTT, THHH, THHT, THTH, THTT, TTHH, TTHT, TTTH, 
TITT};4 


. {January, February, March, April, May, June, July, August, 


September, October, November, December }; 3 


. 30 
. Empirical probability because an experiment was used to 


calculate the frequency of a component lasting for 3 hours. 


. Subjective probability because it is based on opinion. 
. Classical probability because all the outcomes in the event 


and the sample space can be counted. 
15,125 107 (17..0:555 


affect the outcome of the fourth roll. 


. Dependent. The outcome of regularly attending lectures 


affects the outcome of passing the course. 


. 0.04; Yes, the event is unusual because its probability is less 


than or equal to 0.05. 


. Mutually exclusive. A jelly bean cannot be both completely 


red and completely yellow. 


. Mutually exclusive. A person cannot simultaneously be of 


Indian and Chinese origin. 


33. 0.7 35. 0.584 37. 0.568 
exclusive. 

45. 2730 47, 2380 
(b) 0.000062 (c) 0.144 


(b) 0.008  (c) 0.525 


49. 0.000005 
(d) 0.999938 
(d) 0.175 


Quiz for Chapter 3 (page 206) 

1. 450,000 

2. (a) 0.713 (b) 0.662 (c) 0.778 = (d) 0.937 
(e) 0.049 (f) 0.606 (g) 0.346 (h) 0.515 


3. The event in part (e) is unusual because its probability is less 
than or equal to 0.05. 

4. Not mutually exclusive. A bowler can have the highest game 
in a 40-game tournament and still lose the tournament. 
Dependent. One event can affect the occurrence of the 
second event. 


5. 657,720 
6. (a) 2,481,115 (b) 1 (c) 2,572,999 
7. (a) 0.964 (b) 0.0000004 —(c) 0.9999996 


Real Statisticc—Real Decisions for Chapter 3 


1. (a) Sample answer: Investigate the number of possible 
passwords when different sets of characters, such as 
lowercase and capital letters, numbers, and special 
characters. 

(b) You could use the definition of theoretical probability, 
the Fundamental Counting Principal, and the 
Multiplication Rule. 

2. (a) Sample answer: Allow lowercase letters, uppercase 
letters, and numerical digits. 

(b) Sample answer: Because there are 26 lowercase letters, 
26 uppercase letters, and 10 numerical digits, there are 
26 + 26 + 10 = 62 choices for each digit. So, there are 
628 8-digit passwords and the probability of guessing a 
password correctly on one try is 44, which is less than 4 


(page 208) 


3. (a) Without the requirement, the number of possible PINs 
is 10° = 100,000. With the requirement, the number of 
possible PINs is ;9P; = 10-9-8+7:6 = 30,240. 

(b) Sample answer: No, although the requirement would likely 
discourage customers from choosing predictable PINs, the 
numbers of possible PINs would significantly decrease, 
and the most popular PIN, 12345, would still be allowed. 


Chapter 4 
Section 4.1 


1. A random variable represents a value associated with each 
outcome of a probability experiment. 
Examples: Answers will vary. 

3. No; The expected value may not be a possible value of x for 
one trial, but it represents the average value of x over a large 
number of trials. 

5. False. In most applications, discrete random variables 
represent counted data, while continuous random variables 
represent measured data. 

7. False. The mean of the random variable of a probability 
distribution describes a typical outcome. The variance and 
standard deviation of the random variable of a probability 
distribution describe how the outcomes vary. 


(page 219) 


17. 


19. 


21. 
23. 
25. 
29. 


31. 


33. 


35. 


37. 


Section 4.2 
1. 


11 


. Discrete; Attendance is a random variable that is countable. 
. Continuous; Distance traveled is a random variable that 


must be measured. 


. Discrete; The number of cars in a university parking lot is a 


random variable that is countable. 


. Discrete; The number of times a book is issued from the 


library is a random variable that is countable. 
Continuous; The weight of a student’s school bag is a 
random variable that must be measured. 


(a) Pr) (b) Cars per Household 
P(x) 
A 


0.35 oat 
0.27 
0.23 
0.15 


0.3 +> 


0.2 +> 


Probability 


0.1 +> 


WN FP OC] & 


0.0 a Sa See aemaie! 
oO 14 2 3 


Number of Sedan cars 


Skewed right 

(a) 0.50 (b) 0.62 (c) 0.65 (d) 0.38 

No, because the probability is greater than 0.05. 

0.33 27. No 

(a) w ~ 3.6;07? ~ 0.5;0 ~ 0.7 

(b) The mean is 3.6, so the average number of books per 
shelf is about 3 or 4. The standard deviation is 0.7 so 
most of the shelves differ from the mean by no more 
than 1 book. 

(a) w ~ 0.9307 ~ 1130 ~ 1.0 

(b) The mean is 0.9, so the average batch of LED lamps 
has about 0 or 1 defect. The standard deviation is 1.0, so 
most of the batches differ from the mean by no more 
than about 1 defect. 

(a) w ~ 2.0;07 ~ 10;0 ~ 1.0 

(b) The mean is 2.0, so the average hurricane that hits the 
US. mainland is a category 2 hurricane. The standard 
deviation is 1.0,so most of the hurricanes differ from the 
mean by no more than 1 category level. 

An expected value of 0 means that the money gained is equal 

to the money spent, representing the break-even point. 

—$0.05 39. $47,980 41. 1018; 30 


(page 232) 


Each trial is independent of the other trials when the 

outcome of one trial does not affect the outcome of any of 

the other trials. 

. c; Because the probability is greater than 0.5, the distribution 
is skewed left. 

. b; Because the probability is 0.5, the distribution is symmetric. 

. a; Because the probability is less than 0.5, the distribution is 
skewed right. 

. c; The histogram shows probabilities for 12 trials. 

. a; The histogram shows probabilities for 4 trials. 

. b; The histogram shows probabilities for 8 trials. 
As n increases, the distribution becomes more symmetric. 

. (3) 0,1 (4) 0,5 (5) 4,5 

. = 20;07 = 12:0 ~ 3.5 


ODD ANSWERS A57 


13. w ~ 32.2; 07 ~ 23.9;0 ~ 4.9 
15. Binomial experiment 
Success: frequent gamer who plays video games on 
smartphone 
n = 10; p = 0.36; g = 0.64; x = 0, 1, 2,3, 4,5, 6,7 8, 9, 10 
17. Binomial experiment 
Success: card drawn is a heart 
n = 4; p = 0.25; q = 0.75; x = 0, 1,2,3,4 
19. (a) 0.283 (b) 0.260 = (c) 0.457 
21. (a) 0.150 =(b) 0.759 (c) 0.712 
23. (a) 0.221 (b) 0.247 (c) 0.753 
25. (a) 0.089 (b) 0.017 (c) 0.106 
27. (a) 


P(x) 
0.008974 
0.060355 
0.173965 
0.278572 
0.267647 
0.154291 
0.049413 
0.006782 


(b) Health Insurance 
Deductibles 


NY AWN BwWN FH OC] #& 


P(x) 
ry 

0.30-- 
0.25 -- 
0.20- 
0.155 
0.104 
0.05 + 


Probability 


01234567 
Number of working mothers 


Approximately symmetric 
(c) The values 0, 6, and 7 are unusual because their 
probabilities are less than 0.05. 


29. (a) Px) 


0.00064 
0.01077 
0.07214 
0.24151 
0.40426 
0.27068 


np WNP OC] & 


(b) Living to Age 100 
P(x) 
0.45 a 


0.40 >- 
0.35 +> 
0.30-- 
0.25 4- 
0.20 -- 
0.15 +- 
0.10 +- 
0.05 >- 


Probability 


a ae 


‘ere 
Number of adults 
Skewed left 
(c) The values 0 and 1 are unusual because their probabilities 


are less than 0.05. 


A58 ODD ANSWERS 


31. uw ~ 5.0; 0? ~ 1.4; 0 ~ 1.2; On average, 5 out of every 
7 US. adults think that political correctness is a problem 
in America today. The standard deviation is 1.2, so most 
samples of 7 U.S. adults would differ from the mean by at 
most 1.2 US. adults. 

33. uw ~ 6.3; 07% ~ 1.3; 0 ~ 1.2; On average, 6.3 out of every 
8 adults believe that life on other planets is possible. The 
standard deviation is 1.2,so most samples of 8 adults would 
differ from the mean by at most 1.2 adults. 

35. uw ~ 1.9;0° ~ 13;0 ~ 1.1;On average, 1.9 out of every 6 
US. employees who are late for work blame oversleeping. 
The standard deviation is 1.1, so most samples of 6 US. 
employees who are late would differ from the mean by at 
most 1.1 U.S. employees. 

37. 0.033 

39. (a) 0.107 


(b) 0.107  (c) The results are the same. 


Section 4.2 Activity 


1-3. Answers will vary. 


(page 236) 


Section 4.3 (page 242) 


1. 0.080 3. 0.062 5. 0.175 7. 0.251 

9. In a binomial distribution, the value of x represents the 
number of successes in n trials. In a geometric distribution, 
the value of x represents the first trial that results in a 
success. 

11. (a) 0.105 

13. (a) 0.036; unusual 


(b) 0.595 (c) 0.405 
(b) 0.053. (c) 0.017; unusual 


15. (a) 0.230 (b) 0.871 (c) 0.129 
17. (a) 0.0025; unusual (b) 0.0075; unusual (c) 0.98; usual 
19. (a) 0.206 (b) 0.372 (c) 0.628 
21. (a) 0.109 (b) 0.618 (c) 0.382 
23. (a) 0.322 (b) 0.513 (c) 0.809 
25. (a) 0.071 (b) 0.827 (c) 0.173 


27. (a) 0.12542 
(b) 0.12541; The results are approximately the same. 
29. (a) w = 1000; 0? = 999,000; ¢ ~ 999.5  (b) 1000 times 
(c) Lose money. On average, you would win $500 once in 
every 1000 times you play the lottery. So, the net gain 
would be —$500. 

31. (a) 0? = 4.1;0 ~ 2.0; The standard deviation is 2.0 strokes, 
so most of Steven’s scores per hole differ from the mean 
by no more than 2 strokes. 

(b) 0.553 


Uses and Abuses for Chapter 4 


1. At least 20; The probability of at least 20 incidents is about 
0.125, whereas there is about a 0.102 chance of 15 incidents. 

2. Less than 14; The probability of less than 14 incidents is 
about 0.363, whereas there is about a 0.301 chance of 14 to 
16 incidents. 

3. Yes. The probability of 21 incidents is about 0.030, which is 
less than 0.05. 


(page 245) 


Review Exercises for Chapter 4 


1. 
3. 


11. 


13. 
15. 
17. 


19. 


21. 
23. 
25. 


(page 247) 
Discrete. 
a b Hit: G 
(a) Pilbeas (b) ms s per Game 
0 | 0.207 iS 
1 | 0.443 = 0.4 
2 | 0.236 oe 
E 0.2 
3 | 0.086 ke ! 
4 0.021 T ae ae ee eed 
0123 4 5 
5 0.007 Number of hits 
Skewed right 
Yes 


. (a) w ~ 2.3307 ~ 25,0 ~ 16 


(b) The mean is 2.3, so the average number of accidents on a 
weekday is about 2. The standard deviation is 1.6, so most 
of the accidents differ from the mean by no more than 
about 1 accident. 

—$1 

Binomial experiment 

Success: a green candy is selected 


n = 12, p = 0.16, g = 0.84, 
x = 0,1, 2,3, 4,5, 6, 7, 8, 9, 10, 11, 12 
(a) 0.191 (b) 0.891 (c) 0.700 
(a) 0.067 (b) 0.984  (c) 0.917 
(a) (b) Stay at Home 
x | P(x) Mothers With 
0 | 0.0008 mao Degrees 
1 | 0.0126 as 
2 | 0.0798 2B oa 
3 | 0.2529 ae 
2 02 
4 | 0.4003 ag 
5 | 0.2536 pt tit. 
0123 45 


Number of mothers 


Skewed left 
(c) The values 0 and 1 are unusual because their probabilities 
are less than 0.05. 
pw ~ 1.0; 0? ~ 0.9; 0 ~ 1.0; On average, | out of every 8 
drivers is uninsured. The standard deviation is 1.0, so most 
samples of 8 drivers would differ from the mean by at most 
1 driver. 


(a) 0.148  (b) 0.006; unusual (c) 0.820 
(a) 0.154 (b) 0.217 (c) 0.011; unusual 
(a) 0.085 (b) 0.410  (c) 0.430 


Quiz for Chapter 4 


1. (a) Discrete; The number of lightning strikes that occur in 
Wyoming during the month of June is a random variable 
that is countable. 

(b) Continuous; The fuel (in gallons) used by a jet during 
takeoff is a random variable that has an infinite number 
of possible outcomes and cannot be counted. 

(c) Discrete; The number of die rolls required for an 
individual to roll a five is a random variable that is 


(page 250) 


countable. 
2. (a) (b) Wireless Devices 
* P(x) per Household 
0 | 0.238 
1 0.405 05 
2 04 
2 0.209 = a3 
3 | 0.090 2 02 
4 0.040 0.1 
5 | 0.019 012345 
Number of wireless devices 
Skewed right 


(c) w ~ 1.3; 0° ~ 1.4; o ~ 1.2; The mean is 1.3, so the 
average number of wireless devices per household is 
1.3. The standard deviation is 1.2, so most households 
will differ from the mean by no more than 1.2 wireless 


devices. 
(d) 0.058 
3. (a) 0.269 (b) 0.811 (c) 0.061 
4. (a) (b) Successful Surgeries 
x P(x) P(x) 
0 | 0.000008 Ha 
0.40 
1 | 0.000278 p> 085 
= 0.30 
2 | 0.004262 s 0.25 
2 0.20 
3 | 0.034907 a nee (ial 
4 | 0.160820 00 : 
012345 6 
5 0.395159 Number of patients 
a] See Skewed left 


(c) w ~ 5.2; o? ~ 0.7; o ~ 0.8; On average, 5.2 out of 
every 6 patients have a successful surgery. The standard 
deviation is 0.8, so most samples of 6 surgeries would 
differ from the mean by at most 0.8 surgery. 

5. (a) 0.175 (b) 0.440 = (c) 0.007 
6. (a) 0.048 (b) 0.355 = (c) 0.085 
7. Event (a) is unusual because its probability is less than 0.05. 


Real Statistics—Real Decisions for Chapter 4 


1. (a) Sample answer: Calculate the probability of obtaining 
0 clinical pregnancies out of 10 randomly selected ART 
cycles. 

(b) Binomial. The distribution is discrete because the 
number of clinical pregnancies is countable. 


(page 252) 


ODD ANSWERS A59 


2. n = 10, p = 0.33 


* 


P(x) 
0.01823 
0.08978 
0.19899 
0.26136 
0.22528 
0.13315 
0.05465 
0.01538 
0.00284 
0.00031 
0.00002 


Co mMAN DN FWN FK CO 


p 
o 


Sample answer: Because P(0) ~ 0.018, this event is unusual 
but not impossible. 
3. (a) Suspicious, because the probability is less than 0.05. 
(b) Not suspicious, because the probability is greater 
than 0.05. 


Chapter 5 
Section 5.1 


1. Answers will vary. 3. 1 

5. Answers will vary. 
Similarities: The two curves will have the same line of 
symmetry. 
Differences: The curve with the larger standard deviation 
will be more spread out than the curve with the smaller 
standard deviation. 

7.u4=0,0=1 

9. “The” standard normal distribution is used to describe one 
specific normal distribution (u = 0,0 = 1). “A” normal 
distribution is used to describe a normal distribution with 
any mean and standard deviation. 

11. No, the graph is skewed left. 

13. No, the graph crosses the x-axis. 

15. Yes, the graph fulfills the properties of the normal 


(page 264) 


distribution. 

w= 115,0 = 15 
17. 0.9032 19. 0.0228 21. 0.6429 23. 0.6026 
25. 0.0311 27. 0.8389 29. 0.8302 31. 0.4979 
33. 0.9756 35. 0.8808 


A6é0 


37. 


39. 


41. 
47. 
53. 
57. 


59. 


ODD ANSWERS 


(a) Life Spans of Flash Drives 


Frequency 
Oo Lal Ne we - nn 


44,419 4 

4549 + 
64,679 + 
74,809 + 
84,939 4 


Runs (in cycles) 

It is reasonable to assume that the life spans are 
not normally distributed because the histogram is 
asymmetric towards right. 

(b) 69,836; 16,405 

(c) The sample mean of 69,836 runs is less than the claimed 
mean, so, on average, the flash drives in the sample 
lasted for a shorter time. The sample standard deviation 
of 16,405 is greater than the claimed standard deviation, 
so the flash drives in the sample had a greater variation 
in running cycles than the manufacturer’s claim. 


(a) x = 66.50>z = 0.1 
x = 69.7557 =~ 14 
x = 72.5057 ~ 2.5 
x = 60.7537 ~ —2.2 
(b) x = 72.5 is unusual because its corresponding z-score 
(2.5) lies more than 2 standard deviations from the mean. 
x = 60.75 is unusual because its corresponding z-score 
(—2.2) lies more than 2 standard deviations from the mean. 
0.9750 43. 0.9832 45. 0.6826 (Tech: 0.6827) 
0.9898 49. 0.0005 51. 0.4115 
0.9970 (Tech: 0.9972) 55. 0.0014 


36 48 60 72 84 


The normal distribution curve is centered at its mean (60) 
and has 2 points of inflection (48 and 72) representing 
wotoa. 

(1) The area under the curve is 


agi) bata 


(Because a < b, you do not have to worry about 
division by 0.) 
(2) All of the values of the probability density function are 


positive because b is positive when a < b. 


Section 5.2 (page 271) 
1. 0.6179 3. 0.2912 5. 0.4324 (Tech: 0.4325) 
7. (a) 0.2611 (Tech: 0.2623) 


(b) 0.3453 (Tech: 0.3452) 

(c) 0.1190 (Tech: 0.1186) 

(d) No unusual events because all of the probabilities are 
greater than 0.05. 


11. 
13. 
17. 
19. 
21. 


23. 


Section 5.3 


. (a) 0.2389 (Tech: 0.2386) 


(b) 0.1739 (Tech: 0.1742) 

(c) 0.1539 (Tech: 0.1539) 

(d) No, none of these events are unusual because their 
probabilities are greater than 0.05. 

(a) 0.0228  (b) 0.6563 (Tech: 0.6564) — (c) 0.0228 

0.2918 (Tech: 0.2914) 15. 0.0324 (Tech: 0.0325) 

(a) 72.78% (Tech: 72.83%) 

(b) 65 scores (Tech: 64 scores) 

(a) 90.32%  (b) 74.35% 

(c) 5 mothers 

Out of control, because there is a point more than three 

standard deviations beyond the mean. 

Out of control, because there are nine consecutive points 

below the mean, and two out of three consecutive points lie 

more than two standard deviations from the mean. 


(page 279) 
1. 0.98 3. =1.53 5. —1.905 7. 1.205 9. —0.842 
11. 1.645 13. —1.405 15. 0.954 17. —0.38 
19. 1.99 21. —1.96, 1.96 23. 1.011 25. —0.16 
27. 1.656 29. +1.036 
31. (a) $778 million (b) $6.91 million (Tech: $6.90 million) 


33. 


35. 
37. 


39. 


Section 5.4 


1. 
5. 


. (c), because py = 16.5, of = 1.19, 


(c) $748 million (Tech: $745 million) 

(a) 1315.99 kilowatt-hours (Tech: 1316.08 kilowatt-hours) 

(b) 1719.67 kilowatt-hours (Tech: 1719.58 kilowatt-hours) 

(c) 2671.34 kilowatt-hours (Tech: 2671.04 kilowatt-hours) 

(a) 3.66 (b) 3.24 and 3.48 

(a) 5.67 millions of cells per microliter 

(b) 4.98 millions of cells per microliter (Tech: 4.99 millions 
of cells per microliter) 

551.34 grams 41. 7.93 ounces 


(page 291) 


225, 4.619 3. 1022, 7589 
False. As the size of a sample increases, the mean of the 
distribution of sample means does not change. 


. False. A sampling distribution is normal when either n = 30 


or the population is normal. 
and the graph 


approximates a normal curve. 


11. (a) pw = 53.2,0 ~ 19.9 13. (a) w = 389, 0 ~ 28.65 


() Sample | Mean () Sample Mean 
19, 19 19 350, 350, 350 | 350 
19, 48 33.5 350, 350, 399 | 366.33 
19, 56 37.5 350, 350, 418 | 372.67 
19, 64 41.5 350, 399, 350 | 366.33 
19579 49 350, 399, 399 | 382.67 
48, 19 33.5 350, 399, 418 | 389 
48, 48 48 350, 418, 350 | 372.67 
48, 56 52 350, 418, 399 | 389 
48, 64 56 350, 418, 418 | 395.33 
48, 79 63.5 399, 350, 350 | 366.33 
56, 19 375 399, 350, 399 | 382.67 
56, 48 52 399, 350, 418 | 389 
56,56 | 56 399, 399, 350 | 382.67 
56,64 | 60 399, 399, 399 | 399 
56, 79 67.5 399, 399, 418 | 405.33 
64, 19 41.5 399, 418, 350 | 389 
64, 48 56 399, 418, 399 | 405.33 
64, 56 60 399, 418, 418 | 411.67 
64, 64 64 418, 350, 350 | 372.67 
64, 79 71.5 418, 350, 399 | 389 
79,19 49 418, 350, 418 | 395.33 
79, 48 63.5 418, 399, 350 | 389 
79, 56 67.5 418, 399, 399 | 405.33 
79, 64 71.5 418, 399, 418 | 411.67 
79, 79 79 418, 418, 350 | 395.33 

(c) wz = 53.2, oy ~ 14.1 418, 418, 399 | 411.67 

The means are equal, 418, 418, 418 | 418 


but the standard 
deviation of the 
sampling distribution 
is smaller. 


(c) py = 389, 0; ~ 16.54 
The means are equal, but 
the standard deviation of 
the sampling distribution 
is smaller. 


17. 0.1685 (Tech: 0.1689); not unusual 
21. uy = 23, 0; = 0.26 


15. 0.0049; unusual 
19. py = 495, og ~ 26.83 


425 475 525-575 
Mean score 21 22 23 24 25 


a 


Mean temperature 
(in degrees Celsius) 


ODD ANSWERS Aél1 


23. pz = 1.64, oc ~ 0.83 


~< t t >Xx 


Mean per capita water footprint 
(in mega gallons) 


25. px = 132,000, oz ~ 3042.56 


t t > Xx 
130,000 135,000 


Mean salary (in dollars) 
27. n = 15: pz = 495, oe ~ 30.98 
n = 10: py = 495, og ~ 37.95 


400 450 500 550 600 
Mean score 


As the sample size decreases, the standard deviation of the 
sample mean increases, while the mean of the sample means 
remains constant. 

0.4623 (Tech: 0.4645); About 46% of samples of 32 years will 

have a mean gain between 200 and 500. 

0.0708 (Tech: 0.0702); About 7% of samples of 30 Chinese 

cities will have a mean childhood asthma rate greater than 

2.6%. 

It is more likely to select a sample of 10 cities with a mean 

childhood asthma prevalence less than 3.2% because the 

sample of 10 has a higher probability. 

35. No, it is not likely that you would have randomly sampled 
75 cans with a mean less than or equal to 5 kilograms 
because it is more than 4 standard deviations from the mean 
of the sample means. So, the machine needs to be reset. 

37. (a) 0.2119 (b) 0 

39. Yes, the finite correction factor should be used; 0.6772 
(Tech: 0.6755) 

41. 0.0446 (Tech: 0.0448); The probability that less than 55% of 
a sample of 105 residents are in favor of building a new high 
school is about 4.5%. Because the probability is less than 
0.05, this is an unusual event. 


29 


31, 


33 


. 


Section 5.4 Activity 


1-2. Answers will vary. 


(page 296) 


A6é2 ODD ANSWERS 


Section 55 (page 303) (b) 0.0606 (Tech: 0.0611) 


1. 
3 
5. 
9. 
11. 


13. 


15 


17. 
19. 


21. 


. Cannot use normal distribution 


. The probability of getting fewer than 25 successes; 


Cannot use normal distribution 


a 6. d y Aes 8. b 


P(x < 24.5) 

The probability of getting exactly 33 successes; : 
P(32.5 < x < 33.5) 30 405060 
The probability of getting at most 150 successes; Brumber ataidults 

P(x < 150.5) (c) 0.0182 


Binomial: P(S5 =x = 7) ~ 0.549 

Normal: P(4.5 < x < 7.5) = 0.5463 (Tech: 0.5466) 
The results are about the same. 

Can use normal distribution; w = 9.3, 0 ~ 2.533 
Can use normal distribution 

(a) 0.0491 


“= 76 
20 30 40 50 60 


Number of adults 


The event in part (c) is unusual because its probability is less 


than 0.05. 
23. Cannot use normal distribution because np < 5. 
- x (a) 0.0382 (b) 0.4862 (c) 0.6994 
By 60S ID onB0, | 0 100 The event in part (a) is unusual because its probability is less 
Number of double chargings than 0.05. 
(b) 0.3051 25. Can use normal distribution. 
(a) 0.4013 (Tech: 0.4001) 
(c) 0.6949 100 120 140-160 


Number of college graduates 


(b) 0.1867 (Tech: 0.1879) 


50607076 8090100 
Number of double chargings 

The double-charging in exactly 80 cases is an unusual event 

as the probability is less than 0.05. 

Can use normal distribution 

(a) 0.3936 (Tech: 0.3925) (c) 0.3999 


100 120 140 160 
Number of college graduates 


30 40 50 60 100 120 140 160 
Number of adults Number of college graduates 


No unusual events because all of the probabilities are 
greater than 0.05. 


27. (a) 0.0885 (Tech: 0.0878) 
(c) 0.7324 (Tech: 0.7322) 
29. Highly unlikely. Answers will vary. 


(b) 0.1660 (Tech: 0.1658) 


31. 0.1020 


Uses and Abuses for Chapter 5 


1. (a) Not unusual; A sample mean of 112 is less than 
2 standard deviations from the population mean. 
(b) Unusual; A sample mean of 105 is more than 2 standard 
deviations from the population mean. 
2. The ages of students at a high school may not be normally 
distributed. 
3. Answers will vary. 


(page 306) 


Review Exercises for Chapter 5 


Ll w= 15,0 =3 
3. Curve B has the greatest mean because its line of symmetry 
occurs the farthest to the right. 


(page 308) 


5. 0.6772 7. 0.6368 9. 0.6772 
11. 1 — 0.9956 = 0.0044 13. 0.4750 
15. 0.3899 17. 0.1052 
19. x =17—>z = —0.66 21. 0.9131 

x =29—>7z = 1.18 

x = 8->z = —2.05 

x = 23-7 ~ 0.26 
23. 0.8285 25. 0.1336 27. 0.8413 
29. 0.6915 31. 0.1359 


33. (a) 0.4168 (Tech: 0.4173) 
(c) 0.3974 (Tech: 0.3971) 

35. No unusual events because all of the probabilities are 
greater than 0.05. 

37. —0.16 39. 2.455 (Tech: 2.457) 41. 0.44 

45. 119.54 feet 47. 13780 feet (Tech: 13781 feet) 

49. 136.71 feet (Tech: 136.70 feet) 

51. (a) w= 15,0 ~ 1.118 


(b) 0.1425 (Tech: 0.1407) 


43. 0.81 


(b) Samples luntean (c) py = 1.5, 0% ~ 0.791 

The means are equal, but 

0, 0 0 the standard deviation of 

0,1 0.5 the sampling distribution 

0,2 1 is smaller. 

0,3 1:5 

1,0 0.5 

1,1 1 

1,2 1.5 

1,3 2 

2,0 1 

25.1 1.5 

2,2 2 

2.3 29 

3,0 dl} 

3,1 2 

3,2 2.5 

3,3 3 


ODD ANSWERS A63 
53. py = 471.5, 0; ~ 31.761 


w=4715 


x 
400 500 


Mean electric power consumption 
(in kilowatt-hours) 

55. (a) 0.3840 (Tech: 0.3839) 
(c) 0.3557 (Tech: 0.3561) 
The probabilities in parts (a) and (c) are smaller, and the 
probability in part (b) is larger. 

57. (a) 0.8051 (Tech: 0.8043) — (b) 0.8577 (Tech: 0.8580) 

(c) 0.3993 (Tech: 0.3994) 

59. (a) 0.2709 (Tech: 0.2710) —(b) 0.1112 (Tech: 0.1113) 

61. Can use normal distribution; w = 15,0 ~ 1.936 

63. The probability of getting at least 28 successes; P(x > 27.5) 

65. The probability of getting exactly 30 successes; 

P(29.5 < x < 30.5) 

67. The probability of getting less than 50 successes; P(x < 49.5) 

69. Can use normal distribution 
(a) 0.0384 (Tech: 0.0385) 


(b) 0.1898 (Tech: 0.1923) 


“4! = 
12 16 20 24 28 32 
Number of adults 


(b) 0.0798 (Tech: 0.0818) 
x= 24,5 


Number of adults 


(c) 0.0188 (Tech: 0.0190) 


Number of adults 


The events in parts (a) and (c) are unusual because their 
probabilities are less than 0.05. 


Quiz for Chapter 5 (page 312) 


1. (a) 0.9535 (b) 0.9871 (c) 0.3616 
(d) 0.7703 (Tech: 0.7702) 


A6é4 
. (a) 0.0233 (Tech: 0.0231) 


11. 
12. 


1. 


- 21.19% 
~ 125 
. 0.0049; About 0.5% of samples of 60 people will have a 


ODD ANSWERS 


(b) 0.9929 (Tech: 0.9928) 


(c) 0.9198 (Tech: 0.9199)  (d) 0.3607 (Tech: 0.3610) 


. 0.0475 (Tech: 0.0478); Yes, the event is unusual because its 


probability is less than 0.05. 


. 0.2586 (Tech: 0.2611); No, the event is not unusual because 


its probability is greater than 0.05. 
6. 503 people (Tech: 505 people) 
8. 80 


mean IQ score greater than 105. This is a very unusual event. 


. More likely to select one person with an IQ score greater 


than 105 because the standard error of the mean is less than 
the standard deviation. 

Can use normal distribution; uw = 40,0 ~ 5.797 

(a) 0.5359 (Tech: 0.5344) — (b) 0.7823 (Tech: 0.7812) 

(c) 0.0277 (Tech: 0.0266) 

The event in part (c) is unusual because its probability is less 
than 0.05. 


Real Statistics—Real Decisions for Chapter 5 (page 314) 
(a) 0.4207 (b) 0.9988 
(a) 0.3264 (Tech: 0.3274) — (b) 0.6944 (Tech: 0.6957) 


2. 


3. 


Cumulative Review for Chapters 3—5 
1. 


Nun S 


13. 
14. 


. (a) 2.46 


. (a) 2.45 


. (a) 0.777 
. (a) 43,680 
. 0.7642 

10. 
12. 


(c) randomly selected sample mean 
Answers will vary. 


(page 316) 


(a) np = 18.3 = 5,nq = 11.725 

(b) 0.0778 (Tech: 0.0775) 

(c) Yes, because the probability is less than 0.05. 

(b) 195 (c) 140 = (d) 2.46 

The size of a household on average is about 2.5 persons. The 

standard deviation is 1.4,so most households differ from the 

mean by no more than about 1 person. 

(b) 2.29 (c) 1.51 (d) 2.45 

The number of fouls per game for Garrett Temple is about 

2.45 fouls. The standard deviation is 1.5, so most of Temple’s 

games differ from the mean by no more than about 1 or 

2 fouls. 

(b) 0.514 
(b) 0.019 

7. 0.0010 

0.2862 11. 0.5905 

(a) 0.0367 (b) 0.3735 = (c) 0.0029 

(d) The events in parts (a) and (c) are unusual because their 
probabilities are less than 0.05. 

(a) 0.0049 (b) 0.0149 = (c) 0.9046 

(a) 0.277 (b) 0.886 

(c) Dependent. 
P(Being a public school teacher | having 20 years or 
more of full-time teaching experience) # P(Being a 
public school teacher ) 

(d) 0.413 


(c) 0.626 


8. 0.7995 9. 0.4984 


15. (a) pz = 70, o- ~ 0.190 


16. (a) 0.0548 
17. (a) 495 
18. (a) 


(b) 0.0006 


al 


69.2 70 70.8 
Initial pressure (in psi) 


(b) 0.6547 
(b) 0.002 


(c) 52.2 months 


P(x) 
0.000006 
0.0001 
0.0014 
0.0090 
0.0368 
0.1029 
0.2001 
0.2668 
0.2335 
0.1211 
10 | 0.0282 


CONAN KR WNH O|F 


(b) Anticipating Major 
Cyberattacks 


Probability 


+ =e, 

012345678910 
Number of adults 
Skewed left 

(c) The values 0, 1, 2,3, 4, and 10 are unusual because their 


probabilities are less than 0.05. 


Chapter 6 


Section 6.1 
1. 


(page 327) 


You are more likely to be correct using an interval estimate 
because it is unlikely that a point estimate will exactly equal 
the population mean. 


3. d; As the level of confidence increases, z, increases, causing 
wider intervals. 

5. 1.28 7 VAS 9, —0.47 11. 1.76 13. 1.861 

15. 0.192 17. c 18. d 19. b 20. a 

21. (12.0, 12.6) 23. (9.7, 11.3) 25. E = 1.4,x = 13.4 

27. E = 0.17, ¥ = 1.88 29. 126 31. 7 


‘B= 105.2 = 2615 
. (58.5759.89); (58.44,60.02) 


With 90% confidence, you can say that the population mean 
price is between $58.57 and $59.89. With 95% confidence, 
you can say that the population mean price is between 
$58.44 and $60.02. The 95% CI is wider. 


37. 


39. 
41. 
43. 


45. 


47. 


49, 


51. 


53. 


55. 


57. 


59. 


Section 6.2 
1. 


(6.98,13.46); (6.46,14.08) 

With 90% confidence, you can say that the population mean 

rainfall is between 6.98 and 13.46 millimeters. With 95% 

confidence, you can say that the population mean rainfall is 

between 6.36 and 14.08 millimeters. The 95% CI is wider. 

Yes; The margin of error is small (E = 0.67). 

Yes; The right endpoint of the 95% CI is 14.08. 

(a) An increase in the level of confidence will widen the 
confidence interval and the less certain you can be 
about a point estimate. 

(b) Anincrease in the sample size will narrow the confidence 
interval because it decreases the standard error. 

(c) An increase in the population standard deviation will 
widen the confidence interval because small standard 
deviations produce more precise intervals, which are 
smaller. 

(151.5,193.9); (139.5,205.9) 

With 90% confidence, you can say that the population mean 

quantity is between 151.5 and 193.9 milligrams. With 99% 

confidence, you can say that the population mean quantity 

is between 139.5 and 205.9 milligrams. The 99% CI is wider. 

89 

(a) 66 servings 

(b) No; Yes;The 95% Cl is (28.252, 29.748). If the population 
mean is within 3% of the sample mean, then it falls 
outside the CI. If the population mean is within 0.3% of 
the sample mean, then it falls within the CI. 

(a) 7 cans 

(b) Yes; The 90% CI is (1273, 128.2) and 128 ounces falls 
within that interval. 

(a) 74 balls 

(b) Yes;The 99% CT is (27360, 27.640) and there are amounts 
less than 276 inches that fall within that interval. 

Sample answer: A 99% CI may not be practical to use in 

all situations. It may produce a CI so wide that it has no 

practical application. 

(a) 0.707 (b) 0.949 (c) 0.962 (d) 0.975 

(e) 0.711 (f) 0.937 (g) 0.964 (h) 0.979 

The finite population correction factor approaches 1 as the 

sample size decreases and the population size remains the 

same. 

The finite population correction factor approaches 1 as the 

population size increases and the sample size remains the 


same. 
Sample answer: 
f=-— Write the original ti 
- rite the original equation. 
Vn 
EVn = Zoo Multiply each side by Vn. 
Vn = << Divide each side by E. 


Square each side. 


(page 337) 


1833 3. 2.947 5. 2.664 7. 0.686 9. (10.9, 14.1) 


A65 


11. (4.1,45) 13. E=3.7,x=184 15. E=95,x = 741 
17. 6.0; (29.5, 41.5); With 95% confidence, you can say that 
the population mean commute time is between 29.5 and 
41.5 minutes. 

153.83; (372.67, 680.33); With 95% confidence, you can 
say that the population mean cell phone price is between 
$372.67 and $680.33. 

6.4; (29.1, 41.9); With 95% confidence, you can say that 
the population mean commute time is between 29.1 and 
41.9 minutes. This confidence interval is slightly wider than 
the one found in Exercise 17. 


ODD ANSWERS 


19 


21 


23. Yes 25. (a) 1185 (b) 168.1 (c) (1034.3, 1335.7) 
27. (a) 7.49 (b) 1.64 (c) (6.28,8.70) 29. No 

31. (a) 278,430 (b) 56,769 — (c) (253,813.07, 303,046.93) 
33. No 


35. Use a f-distribution because o is unknown and n = 30. 

(26.0, 29.4); With 95% confidence, you can say that the 

population mean BMI is between 26.0 and 29.4. 

Neither distribution can be used because n < 30 and the 

mileages are not normally distributed. 

Yes; Half the sample mean is 0.202, which falls within the 

confidence interval. 

41. No; They are not making good tennis balls because the 
t-value for the sample is ¢ = 10, which is not between 
—to.99 = —2.797 and to.99 = 2.797. 


37. 


39. 


Section 6.2 Activity 


1-2. Answers will vary. 


(page 340) 


Section 6.3 


1. False. To estimate the value of p, the population proportion 
of successes, use the point estimate p = x/n. 
3. 0.060, 0.940 5. 0.650, 0.350 
7. E = 0.014, p = 0.919 9. E = 0.042, p = 0.554 
11. (0.573, 0.607); (0.570, 0.610) 
With 90% confidence, you can say that the population 
proportion of U.S. adults who say they have made a New 
Year’s resolution is between 573% and 60.7%. With 95% 
confidence, you can say it is between 570% and 61.0%. The 
95% confidence interval is slightly wider. 
(0.663, 0.737) 
With 99% confidence, you can say that the population 
proportion of U.S. adults who say they think police officers 
should be required to wear body cameras while on duty is 
between 66.3% and 73.7%. 
15. (0.030, 0.031) 
17. (a) 601 adults = (b) 451 adults 
(c) Having an estimate of the population proportion 
reduces the minimum sample size needed. 
19. (a) 752 adults (b) 295 adults 
(c) Having an estimate of the population proportion 
reduces the minimum sample size needed. 


(page 347) 


13 


. 


A66 


21. 
23. 
25. 


27. 


29. 
31. 
33. 


35. 


37. 


Section 6.3 Activity 


ODD ANSWERS 


Yes; It falls within both confidence intervals. 
No; The minimum sample size needed is 451 adults. 
United States: (0.282, 0.358) 
Canada: (0.177, 0.243) 
France: (0.215, 0.285) 
Japan: (0.459, 0.541) 
Australia: (0.103, 0.157) 
(a) Expect to stay at first employer for 3 or more years: 
(0.670, 0.710) 
Completed an apprenticeship or internship: (0.660, 0.700) 
Employed in field of study: (0.629, 0.671) 
Feel underemployed: (0.488, 0.532) 
Prefer to work for a large company: (0.125, 0.155) 
(b) Expect to stay at first employer for 3 or more years: 
(0.663, 0.717) 
Completed an apprenticeship or internship: (0.653, 0.707) 
Employed in field of study: (0.623, 0.677) 
Feel underemployed: (0.481, 0.539) 
Prefer to work for a large company: (0.120, 0.160) 
(0.666, 0.734) is approximately a 98.1% CI. 
(0.68, 0.74) is approximately a 96.3% CI. 
(0.45, 0.49) is approximately a 98.3% CI. 
(0.51, 0.55) is approximately a 98.3% CI. 
Ifnp <5 or ng < 5, the sampling distribution of 6 may not 
be normally distributed, so z, cannot be used to calculate the 
confidence interval. 


P| a= 2? | 04 a See 
0.0 1.0 0.00 0.45 0.55 0.2475 
0.1 0.9 0.09 0.46 0.54 0.2484 
0.2 0.8 0.16 0.47 0.53 0.2491 
0.3 0.7 0.21 0.48 0.52 0.2496 
0.4 0.6 0.24 0.49 0.51 0.2499 
0.5 0.5 0.25 0.50 0.50 0.2500 
0.6 0.4 0.24 0.51 0.49 0.2499 
0.7 0.3 0.21 0.52 0.48 0.2496 
0.8 0.2 0.16 0.53 0.47 0.2491 
0.9 0.1 0.09 0.54 0.46 0.2484 
1.0 0.0 0.00 0.55 0.45 0.2475 


p = 0.5 gives the maximum value of pd. 


(page 351) 


1-2. Answers will vary. 


Section 6.4 


1. Yes 
5. 
Te 
9, 
11. 


(page 356) 


3. x2, = 14.067, x2 = 2.167 
x2, = 32.852, x2 = 8.907 

x2, = 52.336, x? = 13.121 

(a) (7.33, 20.89)  (b) (2.71, 4.57) 
(a) (755,2401) — (b) (27,49) 


13. 


15. 


17. 


19. 


21. 


23. 


25. 


27. 
29. 


(a) (0.1615, 0.5136) (b) (0.4018, 0.7167) 

With 90% confidence, you can say that the population 
variance is between 0.1615 and 0.5136, and the population 
standard deviation is between 0.4018 and 0.7167 centimeters. 
(a) (181.50, 976.54) — (b) (13.47, 31.25) 

With 99% confidence, you can say that the population 
variance is between 181.50 and 976.54, and the population 
standard deviation is between 13.47 and 31.25 thousand 
dollars. 

(a) (5.46, 45.70) — (b) (2.34, 6.76) 

With 99% confidence, you can say that the population 
variance is between 5.46 and 45.70, and the population 
standard deviation is between 2.34 and 6.76 days. 

(a) (128,492) (b) (11, 22) 

With 95% confidence, you can say that the population 
variance is between 128 and 492, and the population standard 
deviation is between 11 and 22 grains per gallon. 

(a) (0.0986, 0.3137) — (b) (0.314, 0.560) 

With 90% confidence, you can say that the population 
variance is between 0.0986 and 0.3137, and the population 
standard deviation is between 0.314 and 0.560 hours. 

(a) (12.8,571) (b) (3.6, 76) 

With 99% confidence, you can say that the population 
variance is between 12.8 and 571, and the population standard 
deviation is between 3.6 and 76 minutes. 

No, because all of the values in the confidence interval are 
greater than 0.35. 

No, because 0.50 is contained in the class interval. 

Sample answer: Unlike a confidence interval for a population 
mean or proportion, a confidence interval for a population 
variance does not have a margin of error. The left and right 
endpoints must be calculated separately. 


Uses and Abuses for Chapter 6 (page 358) 
1-2. Answers will vary. 3. (a) No (b) Yes 
Review Exercises for Chapter 6 (page 360) 


1 
3. 


(a) 119.05 (b) 6.5 

(112.6, 125.6); With 90% confidence, you can say that 
the systolic blood pressure mean is between 112.6 and 
125.6 mmHg. 


5. E = 6.125, x = 36.375 7. 38 people 9. 1.796 

11. 1.341 13. 5.96 

15. 1.39 

17. (146, 184); With 90% confidence, you can say that the 


. (a) 0.720, 0.280 


population mean height is between 146 and 184 feet. 

(b) (0.697, 0.743); (0.692, 0.747) 

(c) With 90% confidence, you can say that the population 
proportion of U.S. adults who say they want the U.S. to 
play a leading or major role in global affairs is between 
69.7% and 74.3%. With 95% confidence, you can say 
it is between 69.2% and 74.7%. The 95% confidence 
interval is slightly wider. 


21. 


23. 
25. 


27. 
31. 


Quiz for Chapter 6 
1 


Real Statistics—Real Decisions for Chapter 6 
1 


(a) 0.530, 0.470 = (b) (0.513, 0.547); (0.509, 0.551) 

(c) With 90% confidence, you can say that the population 
proportion of U.S. adults who think antibiotics are 
effective against viral infections is between 51.3% and 
54.7%. With 95% confidence, you can say it is between 
50.9% and 55.1%. The 95% confidence interval is 
slightly wider. 

No; It falls outside both confidence intervals. 

(a) 385 adults (b) 335 adults 

(c) Having an estimate of the population proportion 
reduces the minimum sample size needed. 

x2, = 23.685, x7, = 6.571 29. x2 = 32.852, x7 = 8.907 

(a) (185.1, 980.8) (b) (13.6, 31.3) 

With 95% confidence, you can say that the population 

variance is between 185.1 and 980.8, and the population 

standard deviation is between 13.6 and 31.3 knots. 


(page 362) 


(a) 2.598  (b) 0.123 

(c) (2.475, 2.721); With 95% confidence, you can say that 
the population mean winning time is between 2.475 and 
2.721 hours. 

(d) No; It falls outside the confidence interval. 


. 42 champions 
. (a) xX = 6.61, 5 ~ 3.38 


(b) (4.65, 8.57); With 90% confidence, you can say that the 
population mean amount of time is between 4.65 and 
8.57 minutes. 

(c) (4.79, 8.43); With 90% confidence, you can say that the 
population mean amount of time is between 4.79 and 
8.43 minutes. This confidence interval is narrower than 
the one in part (b). 


- (109,990, 156,662); With 95% confidence, you can say that 


the population mean annual earnings is between $109,990 
and $156,662. 
Yes 


. (a) 0.740 


(b) (0.717, 0.763); With 90% confidence, you can say that the 
population proportion of U.S. adults who say that the 
energy situation in the United States is very or fairly 
serious is between 71.7% and 76.3%. 

(c) No; The values fall outside the confidence interval. 

(d) 798 adults 


. (a) (5.41, 38.08) 


(b) (2.32, 6.17); With 95% confidence, you can say that 
the population standard deviation is between 2.32 and 
6.17 minutes. 


(page 364) 


(a) Yes, there has been a change in the mean concentration 
level because the confidence interval for Year 1 does not 
overlap the confidence interval for Year 2. 

(b) No, there has not been a change in the mean 
concentration level because the confidence interval for 
Year 2 overlaps the confidence interval for Year 3. 


a 


ODD ANSWERS A67 


(c) Yes, there has been a change in the mean concentration 
level because the confidence interval for Year 1 does not 
overlap the confidence interval for Year 3. 

The concentrations of cyanide in the drinking water have 

increased over the three-year period. 

The width of the confidence interval for Year 2 may have 

been caused by greater variation in the levels of cyanide 

than in other years, which may be the result of outliers. 

Increase the sample size. 

Answers will vary. 

(a) Sample answer: The sampling distribution of the sample 
means was used because the “mean concentration” 
was used. The sample mean is the most unbiased point 
estimate of the population mean. 

(b) Sample answer: No, because typically o is unknown. 
They could have used the sample standard deviation. 


Chapter 7 


Section 7.1 


1. 


(page 381) 


The two types of hypotheses used in a hypothesis test are 
the null hypothesis and the alternative hypothesis. 

The alternative hypothesis is the complement of the null 
hypothesis. 

You can reject the null hypothesis, or you can fail to reject 
the null hypothesis. 

False. In a hypothesis test, you assume the null hypothesis is 
true. 

True 

False. A small P-value in a test will favor rejection of the null 
hypothesis. 

Ao: w = 645 (claim); H,: w > 645 

Hy: o = 5; Hy: 0 # 5 (claim) 

A: p = 0.45; H,: p < 0.45 (claim) 


c; Hp: w = 3 18. d; Hp: uw = 3 
<_—e—— Haart 
1 2 3 4 1 2 3 4 

b; Ao: w = 3 20. a; Hp: w = 2 
ra ee 


1 2 3 4 1 2 3 4 
Right-tailed 23. Two-tailed 

o = 225 

Ho: o = 225 (claim); H,: 0 > 225 

w>4 

Ap: w = 4; Hy w > 4 (claim) 

p = 0.73 

Hy: p = 0.73 (claim); H,: p # 0.73 

A type I error will occur when the actual proportion of new 
customers who return to place their next order is at least 
0.75, but you reject Hp: p = 0.75. 

A type II error will occur when the actual proportion of new 
customers who return to place their next order is less than 
0.75, but you fail to reject Hp: p = 0.75. 


A6é8 


33. 


35. 


37. 


39. 


41. 


43. 


45. 


ODD ANSWERS 


A type I error will occur when the actual standard deviation 

of the length of time to bowl an over is less than or equal to 

4 minutes, but you reject Hj: 0 = 4. 

A type II error will occur when the actual standard 

deviation of the length of time to bowl an over is greater 

than 4 minutes, but you fail to reject Hp: 0 = 4. 

A type I error will occur when the actual proportion of 

encrypted data that remains protected is at least 0.98, but 

you reject Hp: p = 0.98. 

A type II error will occur when the actual proportion of 

encrypted data remains protected is at less than 0.98, but 

you fail to reject Hp: p = 0.98. 

Hy: The mean number of glasses that break per production 
cycle is less than or equal to 3 glasses. 

H,: The mean number of glasses that break per production 
cycle is greater than 3 glasses. 

Ap: pw = 3; Ay hw > 3 

Right-tailed because the alternative hypothesis contains >. 


P-value 
area 
Zz 


Zz 


Hy: The percentage of IT students getting employed is 95%. 

H,: The percentage of IT students getting employed is 
not 95%. 

Ay: p = 0.95 H,: p # 0.95 

Two-tailed because the alternative hypothesis contains 4. 


5 P-value 5 P-value 
area area 
z 


—z z 


Ho: The standard deviation of a jockey clearing the obstacles 
is greater than or equal to 2. 

H,: The standard deviation of a jockey clearing the obstacles 
is less than 2. 

Hy: 0 = 2; Hao <2 

Left-tailed because the alternative hypothesis contains <. 


P-value 
area 
-2 


Null hypothesis 

(a) There is enough evidence to reject the researcher’s 
claim that the standard deviation of the life span of a 
brand of air conditioner is at most 4.6 years. 

(b) There is not enough evidence to reject the researcher’s 
claim that the standard deviation of the life span of a 
brand of air conditioner is at most 4.6 years. 

Alternative hypothesis 

(a) There is enough evidence to support the scientist’s claim 
that the mean number of migratory bird species is less 
than 3,800. 

(b) There is not enough evidence to support the scientist’s 
claim that the mean number of migratory bird species is 
less than 3,800. 


47. 


49. 
51. 


53. 


55. 


57. 


59. 


Section 7.2 
1. 


17. 
19. 


. (a) Fail to reject Ap. 
. (a) Fail to reject Ap. 


. (a) Fail to reject Hp. 
. P = 0.0934; Reject Ap. 


. P = 0.0930; Fail to reject Hp. 
. (a) P = 0.0089 


Null hypothesis 

(a) There is enough evidence to reject the report’s claim 
that at least 65% of individuals convicted of terrorism 
or terrorism-related offenses in the United States are 
foreign born. 

(b) There is not enough evidence to reject the report’s claim 
that at least 65% of individuals convicted of terrorism 
or terrorism-related offenses in the United States are 
foreign born. 

Ao: w = 75; Hy: we < 75 

(a) Ho: pw = 46; H,: w < 46 

(b) Ho: w = 46; H,: w > 46 

If you decrease a, then you are decreasing the probability 

that you will reject Hp. Therefore, you are increasing the 

probability of failing to reject Ho. This could increase B, the 
probability of failing to reject Hy) when Ah is false. 

Yes; If the P-value is less than a = 0.05, then it is also less 

than a = 0.10. 

(a) Fail to reject Hy because the confidence interval includes 
values greater than 70. 

(b) Reject Hj because the confidence interval is located 
entirely to the left of 70. 

(c) Fail to reject Hy because the confidence interval includes 
values greater than 70. 

(a) Reject Hj because the confidence interval is located 
entirely to the right of 0.20. 

(b) Fail to reject Hy because the confidence interval includes 
values less than 0.20. 

(c) Fail to reject Hy because the confidence interval includes 
values less than 0.20. 


(page 395) 


The z-test using a P-value compares the P-value with the 
level of significance a. In the z-test using rejection region(s), 
the test statistic is compared with critical values. 

(b) Reject Hp. (c) Reject Hp. 
(b) Fail to reject Hp. 

(c) Fail to reject Ap. 

(b) Fail to reject Ho. 

(c) Reject Hp. 

11. P = 0.0069; Reject Ap. 


(b) P = 0.3050 

The larger P-value corresponds to the larger area. 

Fail to reject Hp. 

Critical value: z) = —1.88; Rejection region: z < —1.88 


21. 


23. 


25. 


27. 


29. 


31. 


33. 


35. 


37. 


Critical value: z) = 1.645; Rejection region: z > 1.645 


Critical values: —z = —2.33, z = 2.33 
Rejection regions: z < —2.33, z > 2.33 


(a) Fail to reject Hy because z < 1.285. 

(b) Fail to reject Hy because z < 1.285. 

(c) Fail to reject Hy because z < 1.285. 

(d) Reject Hj because z > 1.285. 

Reject Hp. There is enough evidence at the 5% level of 

significance to reject the claim. 

Fail to reject Hj. There is not enough evidence at the 3% 

level of significance to support the claim. 

(a) The claim is “the mean total score for the school’s 
applicants is more than 499.” 

Ab: w = 499; H,: w > 499 (claim) 

(b) 2.83 (c) 0.0023 (d) Reject Hp. 

(e) There is enough evidence at the 1% level of significance 
to support the report’s claim that the mean total score 
for the school’s applicants is more than 499. 

(a) The claim is “the mean winning times for Boston 
Marathon women’s open division champions is at least 
2.68 hours.” 

Ay: w = 2.68 (claim); H,: w < 2.68 

(b) —1.37  (c) 0.0853 = (d) Fail to reject Ap. 

(e) There is not enough evidence at the 5% level of 
significance to reject the statistician’s claim that the 
mean winning times for Boston Marathon women’s 
open division champions is at least 2.68 hours. 

(a) The claim is “the mean height of top-rated roller 
coasters is 160 feet.” 

Ho: » = 160 (claim); H,: w ~# 160 

(b) 0.39 (c) 0.6992 (d) Fail to reject Hp. 

(e) There is not enough evidence at the 5% level of 
significance to reject the claim that the mean height of 
top-rated roller coasters is 160 feet. 

(a) The claim is “the mean caffeine content per 12-ounce 
bottle of a population of caffeinated soft drinks is 
377 milligrams.” 

Ao: w = 37.7 (claim); H,: w ¥ 37.7 

(b) —Z = —2.575, Z = 2.575 
Rejection regions: z < —2.575, z > 2.575 

(c) —0.72  (d) Fail to reject Ap. 

(e) There is not enough evidence at the 1% level of 
significance to reject the consumer research 
organization’s claim that the mean caffeine content 
per 12-ounce bottle of a population of caffeinated soft 
drinks is 377 milligrams. 


39. 


41. 


43. 


Section 7.3 


1 


a 


17. 


19. 


ODD ANSWERS A69 


(a) The claim is “the mean sugar content in each of the cookies 
produced by a manufacturer is no more than 18%.” 

Ap: w = 18 (claim); H,: w > 18 

(b) zp = 2.05; Rejection region: z > 2.05 

(c) 187  (d) Fail to reject Hp. 

(e) There is enough evidence at the 2% level of significance 
to support the claim that the mean sugar content in each 
of their cookies is no more than 18%. 

(a) The claim is “the mean life of a lamp is at least 
10,000 hours.” 

HA: w = 10,000 (claim); H,: ~ < 10,000 

(b) z = —1.23; Rejection region: z < —1.23 

(c) —128 (d) Reject Hp. 

(e) There is enough evidence at the 11% level of significance 
to reject the lamp manufacturer’s claim that the mean 
life of fluorescent lamps is at least 10,000 hours. 

Outside; When the standardized test statistic is inside the 

rejection region, P < a. 


(page 405) 


Specify the level of significance a and the degrees of 

freedom, d.f. = n — 1. Find the critical value(s) using the 

t-distribution table in the row with n — 1 d.f. When the 

hypothesis test is 

(1) left-tailed, use the “One Tail, a” column with a negative 
sign. 

(2) right-tailed, use the “One Tail, a” column with a positive 
sign. 

(3) two-tailed, use the “Two Tails, a” column with a negative 
and a positive sign. 

Critical value: ff = —1.328; Rejection region: t < —1.328 

Critical value: fg = 1.717; Rejection region: t > 1.717 

Critical values: —f9 = —2.056, to = 2.056 

Rejection regions: tf < —2.056, t > 2.056 

(a) Fail to reject Hy because t > —2.086. 

(b) Fail to reject Hy because t > —2.0806. 

(c) Reject Hy because t < —2.086. 

(a) Reject Hy because t < —1.725. 

(b) Fail to reject Hy because —1.725 < t < 1.725. 

(c) Reject Hy because t > 1.725. 

Fail to reject Hp. There is not enough evidence at the 1% 

level of significance to reject the claim. 

Reject Hp. There is enough evidence at the 1% level of 

significance to reject the claim. 

Fail to reject Hp. There is not enough evidence at the 2% 

level of significance to reject the claim. 

(a) The claim is “the mean rental of a warehouse (with basic 
amenities) is $2500.” 
Ho: w = 2500 (claim); H,: 4 # 2500 

(b) —t) = —2.715, to = 2.715 
Rejection regions: t < —2.715, t > 2.715 

(c) 3.082 (d) Reject Ap. 


A70 


21. 


23. 


25. 


27. 


29. 


31. 


ODD ANSWERS 


(e) There is enough evidence at the 1% level of significance 
to reject the claim that the mean rental of a warehouse 
(with basic amenities) is $2500. 

(a) The claim is “the mean credit card debt by state is 
greater than $5500 per person.” 

Alp: w = 5500; H,: w > 5500 (claim) 

(b) to = 1.699; Rejection region: t > 1.699 

(c) 0.86 (d) Fail to reject Ho. 

(e) There is not enough evidence at the 5% level of 
significance to support the credit reporting agency’s 
claim that the mean credit card debt by state is greater 
than $5500 per person. 

(a) The claim is “the mean amount of carbon monoxide in 
the air in USS. cities is less than 2.34 parts per million.” 
Alp: w = 2.34; Hy: pw < 2.34 (claim) 

(b) t) = —1.295; Rejection region: t < —1.295 

(c) 0.11 (d) Fail to reject Ho. 

(e) There is not enough evidence at the 10% level of 
significance to support the claim that the mean amount 
of carbon monoxide in the air in USS. cities is less than 
2.34 parts per million. 

(a) The claim is “the mean annual salary for senior-level 
product engineers is $98,000.” 

A: w = 98,000 (claim); H,: w #~ 98,000 

(b) —t = —2.131, to = 2.131 
Rejection regions: ¢ < —2.131, f > 2.131 

(c) —1.87  (d) Fail to reject Hp. 

(e) There is not enough evidence at the 5% level of 
significance to reject the employment information 
service’s claim that the mean annual salary for 
senior-level product engineers is $98,000. 

(a) The claim is “the mean minimum time it takes for a sedan 
to travel a quarter mile is greater than 14.7 seconds.” 
Alp: w = 14.7; Hy: wp > 14.7 (claim) 

(b) 0.0664 (c) Reject Hp. 

(d) There is enough evidence at the 10% level of significance 
to support the consumer group’s claim that the mean 
minimum time it takes for a sedan to travel a quarter 
mile is greater than 14.7 seconds. 

(a) The claim is “the mean number of deliveries per 
deliveryman is more than 28 packets per day” 

Aly: w = 28 (claim); H,: wp < 28 

(b) 0.025 (c) Fail to reject Ho. 

(d) There is enough evidence at the 5% level of significance 
to accept the claim that the mean number of deliveries 
per deliveryman is more than 28 packets per day. 

Use the ¢-distribution because o is unknown, the sample is 

random, and the population is normally distributed. 

Fail to reject Hp. There is not enough evidence at the 5% 

level of significance to reject the car company’s claim that 

the mean gas mileage for the luxury sedan is at least 23 miles 
per gallon. 


33. 


Section 7.3 Activity 


More likely; The tails of a ¢-distribution curve are thicker 
than those of a standard normal distribution curve. So, if 
you incorrectly use a standard normal sampling distribution 
instead of a f-sampling distribution, then the area under the 
curve at the tails will be smaller than what it would be for 
the ¢-test, meaning the critical value(s) will lie closer to the 
mean. This makes it more likely for the test statistic to be 
in the rejection region(s). This result is the same regardless 
of whether the test is left-tailed, right-tailed, or two-tailed; 
in each case, the tail thickness affects the location of the 
critical value(s). 


(page 408) 


1-3. Answers will vary. 


Section 7.4 
1. 
3. 
5. 


7. 


11. 


(page 413) 


If np = 5 and ng = 5, then the normal distribution can be 

used. 

Cannot use normal distribution. 

Can use normal distribution. 

Fail to reject Hp. There is not enough evidence at the 5% 

level of significance to support the claim. 

(a) The claim is “less than 80% of U.S. adults think that 
healthy children should be required to be vaccinated.” 
Ho: p = 0.80; H,: p < 0.80 (claim) 

(b) z = —1.645; Rejection region: z < —1.645 

(c) 0.707 (d) Fail to reject Ho. 

(e) There is not enough evidence at the 5% level of 
significance to support the medical researcher’s claim 
that less than 80% of US. adults think that healthy 
children should be required to be vaccinated. 


. (a) The claim is “at most 3% of working college students 


are employed as teachers or teaching assistants.” 
Ho: p = 0.03 (claim); H,: p > 0.03 

(b) zo = 2.33; Rejection region: z > 2.33 

(c) 0.83. (d) Fail to reject Hp. 

(e) There is not enough evidence at the 1% level of 
significance to reject the education researcher’s claim 
that at most 3% of working college students are 
employed as teachers or teaching assistants. 

(a) The claim is “85% of Americans think they are unlikely 
to contract the Zika virus.” 

Ho: p = 0.85 (claim); H,: p # 0.85 

(b) —zp = —1.96, zo = 1.96 
Rejection region: z < —1.96, z > 1.96 

(c) 2.21 (d) Reject Ap. 

(e) There is enough evidence at the 5% level of significance 
to reject the medical researcher’s claim that 85% of 
Americans think they are unlikely to contract the Zika 
virus. 


13. 


15. 


17. 


19, 


Section 7.4 Activity 


(a) The claim is “27% of U.S. adults would travel into space 
on a commercial flight if they could afford it.” 
Ho: p = 0.27 (claim); H,: p # 0.27 

(b) 0.03 (c) Reject Hp. 

(d) There is enough evidence at the 5% level of significance 
to reject the research center’s claim that 27% of US. 
adults would travel into space on a commercial flight if 
they could afford it. 

(a) The claim is “less than 67% of U.S. households own a 
pet.” 

Ay: p = 0.67; H,: p < 0.67 (claim) 

(b) 0.15 (c) Fail to reject Ho. 

(d) There is not enough evidence at the 10% level of 
significance to support the humane society’s claim that 
less than 67% of U.S. households own a pet. 

Fail to reject Hp. There is not enough evidence at the 5% 

level of significance to reject the claim that at least 63% of 

adults make an effort to live in ways that help protect the 
environment some of the time. 

(a) The claim is “less than 80% of U.S. adults think that 
healthy children should be required to be vaccinated.” 
Ay: p = 0.80; H,: p < 0.80 (claim) 

(b) zy = —1.645; Rejection region: z < —1.645 

(c) 0.707 (d) Fail to reject Ho. 

(e) There is not enough evidence at the 5% level of 
significance to support the medical researcher’s claim 
that less than 80% of US. adults think that healthy 
children should be required to be vaccinated. 

The results are the same. 


(page 415) 


1-2. Answers will vary. 


Section 7.5 
1. 


11. 
13. 


(page 422) 


Specify the level of significance a. Determine the degrees 
of freedom. Determine the critical values using the 
y’-distribution. For a right-tailed test, use the value that 
corresponds to d.f. and a; for a left-tailed test, use the value 
that corresponds to df. and 1 — a; for a two-tailed test, use 
the values that correspond to d.f. and 5a, and d.f.and 1 — 5a. 


. The requirement of a normal distribution is more important 


when testing a standard deviation than when testing a 
mean. When the population is not normal, the results of a 
chi-square test can be misleading because the chi-square test 
is not as robust as the tests for the population mean. 


. Critical value: xj = 38.885; Rejection region: x* > 38.885 
. Critical value: xj = 0.872; Rejection region: x* < 0.872 
. Critical values: x7 = 60.391, x3, = 101.879 


Rejection regions: y* < 60.391, x7 > 101.879 

Critical value: yi = 49.588; Rejection region: y? > 49.588 
(a) Fail to reject Hy because y” < 6.251. 

(b) Fail to reject Hy because y” < 6.251. 

(c) Fail to reject Hy because y” < 6.251. 

(d) Reject Hy because x? > 6.251. 


ODD ANSWERS A71 


15. Fail to reject Hp. There is not enough evidence at the 5% 

level of significance to reject the claim. 

Reject Hp. There is enough evidence at the 1% level of 

significance to reject the claim. 

Reject Hp. There is enough evidence at the 10% level of 

significance to support the claim. 

21. Fail to reject Hp. There is not enough evidence at the 1% 

level of significance to support the claim. 

23. (a) The claim is “the variance of the thickness in a certain 
helmet model is 75.” 

Hy: 0? = 7.5 (claim); H,: 0? 4 7.5 

(b) v7 = 6.571, xp = 23.685 
Rejection regions: y* < 6.571, y* > 23.685 

(c) 5.04 (d) Reject Ap. 

(e) There is enough evidence at the 10% level of significance 
to reject the claim that the variance of the thickness in a 
certain helmet model is 75. 

25. (a) The claim is “the standard deviation for grade 12 
students on a mathematics assessment test is less than 
35 points.” 

Hy: 0 = 35; H,: o < 35 (claim) 

(b) x = 18.114; Rejection region: x* < 18.114 

(c) 25.48  (d) Fail to reject Ah. 

(e) There is not enough evidence at the 10% level of 
significance to support the school administrator’s claim 
that the standard deviation for grade 12 students on a 
mathematics assessment test is less than 35 points. 

27. (a) The claim is “the standard deviation of the reading 
days of the readers of a particular book is no more than 
2 days.” 

Hp: 0 = 2 (claim); H,: 0 > 2. 

(b) x = 40.256; Rejection region: x? > 40.256 

(c) 675  (d) Reject Ap. 

(e) There is enough evidence at the 10% level of significance 
to reject the claim that the standard deviation of the 
reading days of the readers of a particular book is no 
more than 2 days. 

29. (a) The claim is “the standard deviation of the annual 
salaries of senior-level graphic design specialists is 
different from $10,300.” 

Ho: o = 10,300; H,: 7 # 10,300 (claim) 

(b) x7, = 5.629, x = 26.119 
Rejection regions: y? < 5.629, y? > 26.119 

(c) 2786 (d) Reject Ap. 

(e) There is enough evidence at the 5% level of significance 
to support the claim that the standard deviation of the 
annual salaries of senior-level graphic design specialists 
is different from $10,300. 

31. P-value = 0.4524; Fail to reject Hp. 

33. P-value = 0.0001; Reject Hp. 


17. 


19 


. 


Uses and Abuses for Chapter 7 
1. Ho: p = 0.57; Answers will vary. 


(page 426) 


2-3. Answers will vary. 


A72 


Review Exercises for Chapter 7 


1. 
3. 
5. 
7. 


11. 
13. 


15. 


17. 
19. 
21. 


ODD ANSWERS 


(page 428) 


Ay: w = 100 (claim); H,: w > 100 

Ay: p = 0.205; H,: p < 0.205 (claim) 

Ay: o = 2.5; H,: o > 2.5 (claim) 

(a) Hy: p = 0.65 (claim); H,: p # 0.65 

(b) A type I error will occur when the actual proportion of 
US. adults who have volunteered their time or donated 
money to help clean up the environment is 65%, but you 
reject Hp: p = 0.65. 

A type II error will occur when the actual proportion is 
not 65%, but you fail to reject Hp: p = 0.65. 

(c) Two-tailed because the alternative 
contains #. 

(d) There is enough evidence to reject the polling 
organization’s claim that the proportion of U.S. adults 
who have volunteered their time or donated money to 
help clean up the environment is 65%. 

(e) There is not enough evidence to reject the polling 
organization’s claim that the proportion of U.S. adults 
who have volunteered their time or donated money to 
help clean up the environment is 65%. 


hypothesis 


. (a) Hy: o = 9.5 (claim); H,;: 0 > 9.5 


(b) A type I error will occur when the actual standard 
deviation of the fuel economies is no more than 9.5 miles 
per gallon, but you reject Hp: 0 = 9.5. 

A type II error will occur when the actual standard 
deviation of the fuel economies is more than 9.5 miles 
per gallon, but you fail to reject Hj: 0 = 9.5. 

(c) Right-tailed because the alternative hypothesis 
contains >. 

(d) There is enough evidence to reject the nonprofit 
consumer organization’s claim that the standard 
deviation of the fuel economies of its top-rated vehicles 
for a recent year is no more than 9.5 miles per gallon. 

(ec) There is not enough evidence to reject the nonprofit 
consumer organization’s claim that the standard 
deviation of the fuel economies of its top-rated vehicles 
for a recent year is no more than 9.5 miles per gallon. 

0.1190; Fail to reject Hp. 

Critical value: z) = —2.05; Rejection region: z < —2.05 


-372-1 01 2 3 
62465 


Critical value: z) = 1.96; Rejection region: z > 1.96 


Fail to reject Hy because —1.645 < z < 1.645. 

Fail to reject Hy because —1.645 < z < 1.645. 

Fail to reject Hp. There is not enough evidence at the 10% 
level of significance to reject the claim. 


23. 


25. 


27. 


29, 
31. 
33. 
35. 
37. 
39. 


41. 


43. 


45. 


47. 


Because z < —2.17, reject Hp. There is enough evidence at 

the 3% level of significance to support the claim. 

(a) The claim is “the mean annual production of cotton is 
3.5 million bales per country.” 

Ap: w = 3.5 (claim); H,: w # 3.5 

(b) —2.06 (c) 0.0394 (d) Reject Hp. 

(e) There is enough evidence at the 5% level of significance 
to reject the researcher’s claim that the mean annual 
production of cotton is 3.5 million bales per country. 

(a) The claim is “the mean amount of sulfur dioxide in the 
air in USS. cities is 1.15 parts per billion.” 

Ap: w = 1.15 (claim); H,: w #~ 1.15 

(b) —z = —2.575, zy = 2.575 
Rejection regions: z < —2.575, z > 2.575 

(c) —0.97  (d) Fail to reject Ap. 

(e) There is not enough evidence at the 1% level of 
significance to reject the environmental researcher’s 
claim that the mean amount of sulfur dioxide in the air 
in US. cities is 1.15 parts per billion. 

Critical values: —f9 = —2.11, to = 2.11; 

Rejection regions: t < —2.11,¢ > 2.11 

Critical value: fg = 1.796; Rejection region: t > 1.796 

Critical value: tf) = —2.977; Rejection region: t < —2.977 

Reject Hj. There is enough evidence at the 0.5% level of 

significance to support the claim. 

Reject Hy. There is enough evidence at the 1% level of 

significance to reject the claim. 

Fail to reject Hp. There is not enough evidence at the 10% 

level of significance to reject the claim. 

(a) The claim is “the mean monthly cost of joining a health 
club is $25.” 

Ap: w = 25 (claim); H,: w # 25 

(b) —t = —1.740, ty = 1.740 
Rejection regions: tf < —1.740, t > 1.740 

(c) 164 (d) Fail to reject Ap. 

(e) There is not enough evidence at the 10% level of 
significance to reject the advertisement’s claim that the 
mean monthly cost of joining a health club is $25. 

(a) The claim is “the mean score for grade 12 students on a 
science achievement test is more than 145.” 

Ay: w = 145; H,: w > 145 (claim) 

(b) 0.0824  (c) Reject Hp. 

(d) There is enough evidence at the 10% level of 
significance to support the education publication’s claim 
that the mean score for grade 12 students on a science 
achievement test is more than 145. 

Can use normal distribution. 

Fail to reject Hy. There is not enough evidence at the 5% 

level of significance to reject the claim. 

Can use normal distribution. 

Reject Hy. There is enough evidence at the 1% level of 

significance to support the claim. 


49, 


51. 
53. 


55. 
57. 


59. 


61. 


Quiz for Chapter 7 


1. 


(a) The claim is “over 40% of U.S. adults say they are less 
likely to travel to Europe in the next six months for fear 
of terrorist attacks.” 

Ay: p = 0.40; H,: p > 0.40 (claim) 

(b) zy = 2.33; Rejection region: z > 2.33 

(d) Fail to reject Hp. 

(e) There is not enough evidence at the 1% level of 
significance to support the polling agency’s claim that 
over 40% of USS. adults say they are less likely to travel 
to Europe in the next six months for fear of terrorist 
attacks. 

Critical value: v6 = 30.144; Rejection region: y? > 30.144 

Critical values: x7 = 26.509, x3 = 55.758 

Rejection regions: y* < 26.509, x7 > 55.758 

Reject Hp. There is enough evidence at the 10% level of 

significance to support the claim. 

Fail to reject Hp. There is not enough evidence at the 5% 

level of significance to reject the claim. 

(a) The claim is “the variance of the bolt widths is at most 
0.01.” 

Hp: o? = 0.01 (claim); H,: 0? > 0.01 

(b) x6 = 49.645; Rejection region: x* > 49.645 

(c) 172.8 (d) Reject Hp. 

(e) There is enough evidence at the 0.5% level of significance 
to reject the bolt manufacturer’s claim that the variance 
is at most 0.01. 

You can reject Hp at the 1% level of significance because 

x? = 172.8 > 46.963 


(c) 1.29 


(page 432) 


(a) The claim is “the mean hat size for a male is at least 
7.25.” 

Ay: w = 7.25 (claim); H,: w < 7.25 

(b) Left-tailed because the alternative hypothesis contains 
<; z-test because o is known and the population is 
normally distributed. 

(c) Sample answer: z) = —2.33; 

Rejection region: z < —2.33; —1.28 

(d) Fail to reject Hp. 

(e) There is not enough evidence at the 1% level of 
significance to reject the company’s claim that the mean 
hat size for a male is at least 7.25. 

(a) The claim is “the mean daily base price for renting 
a full-size or less expensive vehicle in Vancouver, 
Washington, is more than $36.” 

Ay: w = 36; H,: w > 36 (claim) 

(b) Right-tailed because the alternative hypothesis contains 
>; z-test because o is known and n = 30. 

(c) Sample answer: z) = 1.28; 

Rejection region: z > 1.28; 1.997 

(d) Reject Hp. 

(e) There is enough evidence at the 10% level of significance 
to support the travel analyst’s claim that the mean daily 
base price for renting a full-size or less expensive 
vehicle in Vancouver, Washington, is more than $36. 


3. 


ODD ANSWERS A73 


(a) The claim is “the mean amount of earnings for full-time 
workers ages 18 to 24 with a bachelor’s degree in a 
recent year is $47,254.” 

Ao: w = 47,254 (claim); H,: w # 47,254 

(b) Two-tailed because the alternative hypothesis contains 
#; t-test because o is unknown and the population is 
normally distributed. 

(c) Sample answer: —ty = —2.145, to = 2.145; 

Rejection regions: tf < —2.145, t > 2.145; 2.58 

(d) Reject Hp. 

(e) There is not enough evidence at the 5% level of 
significance to support the government agency’s claim 
that the mean amount of earnings for full-time workers 
ages 18 to 24 with a bachelor’s degree is a recent year is 
$47,254. 

(a) The claim is “program participants have a mean weight 
loss of at least 10.5 pounds after 1 month.” 

Ho: w = 10.5 (claim); H,: w < 10.5 

(b) Left-tailed because the alternative hypothesis contains 
<; t-test because o is unknown and n = 30. 

(c) Sample answer: ty = —2.462; 

Rejection region: t < —2.462; —3.09 

(d) Reject Hp. 

(e) There is enough evidence at the 1% level of significance 
to reject the weight loss program’s claim that program 
participants have a mean weight loss of at least 10.5 
pounds after 1 month. 

(a) The claim is “less than 18% of the vehicles a nonprofit 
consumer organization rated in a recent year have an 
overall score of 78 or more.” 

Ho: p = 0.18; H,: p < 0.18 (claim) 

(b) Left-tailed because the alternative hypothesis contains 
<; z-test because np = 5 and ng = S. 

(c) Sample answer: z = —1.645; 

Rejection region: z < —1.645; 0.49 

(d) Fail to reject Hp. 

(e) There is not enough evidence at the 5% level of 
significance to support the nonprofit consumer 
organization’s claim that less than 18% of the vehicles a 
nonprofit consumer organization rated in a recent year 
have an overall score of 78 or more. 

(a) The claim is “the standard deviation of vehicle rating 
scores is 11.90.” 

Hy: o = 11.90 (claim); H,: 0 # 11.90 

(b) Two-tailed because the alternative hypothesis contains 
#; chi-square test because the test is for a standard 
deviation and the population is normally distributed. 

(c) Sample answer: x7, = 68.249, v2 = 112.022; 

Rejection regions: y* < 68.249, x7 > 112.022; 89.90 

(d) Fail to reject Hp. 

(e) There is not enough evidence at the 10% level of 
significance to reject the nonprofit consumer 
organization’s claim that the standard deviation of 
vehicle rating scores is 11.90. 


A74 ODD ANSWERS 


Real Statisticc—Real Decisions for Chapter 7 


1. Answers will vary. 
2. (a) Not random; Sample answer: Randomly select students 
from the student directory. 
(b) Random  (c) Random 
3. Alternative hypothesis because you cannot use a hypothesis 
test to support your claim if your claim is the null hypothesis. 
4. No 5. No 6. Answers will vary. 


(page 434) 


Chapter 8 
Section 8.1 


1. Two samples are dependent when each member of one 

sample corresponds to a member of the other sample. 
Example: The weights of 22 people before starting an 
exercise program and the weights of the same 22 people 
6 weeks after starting the exercise program. 
Two samples are independent when the sample selected 
from one population is not related to the sample selected 
from the other population. Example: The weights of 25 cats 
and the weights of 20 dogs. 

. Use P-values. 

. Dependent because the same football players were sampled. 

. Independent because different boats were sampled. 

. Reject Hp. 

11. Reject Hj. There is enough evidence at the 1% level of 

significance to reject the claim. 

13. Fail to reject Hy. There is not enough evidence at the 5% 

level of significance to support the claim. 

15. (a) The claim is “the mean braking distances are different 

for the two makes of automobiles.” 
Aly: by = Mos Hy: by * My (claim) 

(b) —zp = —1.645, z = 1.645 
Rejection regions: z < —1.645, z > 1.645 

(c) 2.80  (d) Reject Hp. 

(e) There is enough evidence at the 10% level of significance 
to support the safety engineer’s claim that the mean 
braking distances are different for the two makes of 
automobiles. 

17. (a) The claim is “Region A receives lesser rainfall than 

Region B.” 
Al: by = Bos Hy: by < My (claim) 

(b) zy > —2.575; Rejection region: z < —2.575 

(c) —2.171  (d) Accept Hp. 

(e) There is enough evidence at the 1% level of significance 
to conclude that Region A receives lesser rainfall than 
Region B. 

19. (a) The claim is “ACT mathematics and science scores are 

equal.” 
Aly: by = Mp (claim); Ay: by 7 Me 

(b) —Z = —2.575, Z = 2.575 
Rejection regions: z < —2.575, z > 2.575 

(c) —0.21 (d) Fail to reject Hp. 


(page 446) 


Sern w 


(e) There is not enough evidence at the 1% level of 
significance to reject the claim that ACT mathematics 
and science scores are equal. 

21. (a) The claim is “the mean home sales price in Casper, 
Wyoming, is the same as in Cheyenne, Wyoming.” 
Alo: py = 2 (claim); Hy: wy ~ pe 

(b) —z = —2.575, z = 2.575 
Rejection regions: z < —2.575, z > 2.575 

(c) 0.15  (d) Fail to reject Hp. 

(e) There is not enough evidence at the 1% level of 
significance to reject the real estate agency’s claim that 
the mean home sales price in Casper, Wyoming, is the 
same as in Cheyenne, Wyoming. 

23. (a) The claim is “the precipitation in Seattle, Washington, 
was greater than in Birmingham, Alabama.” 
Alo: by S brs Ag: ba > 2 (claim) 

(b) zy = 1.645; Rejection region: z > 1.645 

(c) 1.02 (d) Fail to reject Ap. 

(e) There is not enough evidence at the 5% level of 
significance to support the climatologist’s claim that the 
precipitation is Seattle, Washington, was greater than in 
Birmingham, Alabama. 

25. They are equivalent through algebraic manipulation of the 
equation. 

My = M2 => 1 — 2 = 0 

27. Ho: 4 — bz = 2000; Hz: wy, — hz > 2000 (claim) 

Fail to reject Hp. There is not enough evidence at the 5% 

level of significance to support the claim that the difference 

between the mean annual salaries of entry-level software 
engineers in Raleigh, North Carolina, and Wichita, Kansas, 
is more than $2000. 

29. —$3129 < py, — bo < $6449 


Section 8.2 


1. (1) The population standard deviations are unknown. 
(2) The samples are randomly selected. 
(3) The samples are independent. 
(4) The populations are normally distributed or each sample 
size is at least 30. 
3. (a) —f = —1.714, tp = 1.714 
(b) —t = —1.812, to = 1.812 
5. (a) tf = —1.746  (b) & = —1.943 
» (a) ft = 1.729 (b) ft = 1.895 
9. Fail to reject Hp. There is not enough evidence at the 1% 
level of significance to reject the claim. 
11. Reject Hp. There is enough evidence at the 5% level of 
significance to reject the claim. 
13. (a) The claim is “the mean annual costs of food for dogs and 
cats are the same.” 
Alo: py = 2 (claim); H,: wy, ~ pe 
(b) —t) = —1.694, ty) = 1.694 
Rejection regions: t < —1.694, t > 1.694 
(c) 8.19 (d) Reject Hp. 
(e) There is enough evidence at the 10% level of significance 
to reject the pet association’s claim that the mean annual 
costs of food for dogs and cats are the same. 


(page 454) 


~ 


15. 


17. 


19. 


21. 


(a) The claim is “the units produced at Site A contain more 
defects than the units produced at Site B.” 

Ah: by = Be; Ag: by > My (claim) 

(b) to = 1.684; Rejection region: t > 1.684 

(c) 186 (d) Reject Hp. 

(e) There is enough evidence at the 5% level of significance 
to support the claim that the units produced at Site A 
have more defects than units produced at Site B. 

(a) The claim is “the mean household income in a recent 
year is greater in Cuyahoga County, Ohio, than it is in 
Wayne County, Michigan.” 

Ah: by = be; Ae: by > Hy (claim) 

(b) t) = 1.761; Rejection region: t > 1.761 

(c) 5.65 (d) Reject Hp. 

(e) There is enough evidence at the 5% level of significance 
to support the demographics researcher’s claim that 
the mean household income in a recent year is greater 
in Cuyahoga County, Ohio, than it is in Wayne County, 
Michigan. 

(a) The claim is “an experimental method makes a 
difference in the tensile strength of steel bars.” 

Ap: bay = B23 Ha: by ~ bz (claim) 

(b) —t = —2.819, ty = 2.819 
Rejection regions: f < —2.819, ¢ > 2.819 

(c) —2.64 (d) Fail to reject Hp. 

(e) There is not enough evidence at the 1% level of 
significance to support the claim that an experimental 
method makes a difference in the tensile strength of 
steel bars. 

(a) The claim is “the new method of teaching reading 
produces higher reading test scores than the old 
method.” 

Ah: by = be; Ag: by < Me (claim) 

(b) t = —1.303; Rejection region: t < —1.303 

(c) —4.286 (d) Reject Ho. 

(e) There is enough evidence at the 10% level of significance 
to support the claim that the new method of teaching 
reading produces higher reading test scores than the old 
method. 


93, 0.07 <a, =p S008 225, “208 Say = pay SS 
Section 8.3 (page 464) 
1. (1) The samples are randomly selected. 


3. 


5. 


7. 


(2) The samples are dependent. 

(3) The populations are normally distributed or the numbern 
of pairs of data is at least 30. 

Fail to reject Hj. There is not enough evidence at the 

5% level of significance to support the claim. 

Reject Hp. There is enough evidence at the 10% level of 

significance to reject the claim. 

Reject Hp. There is enough evidence at the 1% level of 

significance to reject the claim. 


9. 


11. 


13. 


15. 


17. 


ODD ANSWERS A75 


(a) The claim is “seven of the stocks that make up the Dow 
Jones Industrial Average lost value from one hour to the 
next on one business day.” 

Alo: ba = 9; Ay: ba > 0 (claim) 

(b) to = 3.143; Rejection region: ¢ > 3.143 

(c) d ~ 0.087; sq ~ 0.405  (d) 0.569 

(e) Fail to reject Hp. 

(f) There is not enough evidence at the 1% level of 
significance to support the stock market analyst’s claim 
that seven of the stocks that make up the Dow Jones 
Industrial Average lost value from one hour to the next 
on one business day. 

(a) The claim is “caffeine ingestion improves repeated 
freestyle sprints in trained male swimmers.” 

Alo: ba = 9; Ay: ba > 0 (claim) 

(b) to = 3.365; Rejection region: t > 3.365 

(c) d ~ 0.533; s¢ ~ 0.350 (d) 3.730 

(e) Reject Hp. 

(f) There is enough evidence at the 1% level of significance 
to support the researcher’s claim that caffeine ingestion 
improves repeated freestyle sprints in trained male 
swimmers. 

(a) The claim is “soft tissue therapy helps to reduce 
the numbers of days per week patients suffer from 
headaches.” 

Ap: ba = 9; Ay: ba > 0 (claim) 

(b) to = 2.567; Rejection region: t > 2.567 

(c) d= 1.5; sy ~ 1.249 (d) 5.095 

(e) Reject Hp. 

(f) There is enough evidence at the 1% level of significance 
to support the physical therapist’s claim that soft tissue 
therapy helps to reduce the numbers of days per week 
patients suffer from headaches. 

(a) The claim is “student housing rates have increased from 
one academic year to the next.” 

Ap: ba = 0; Hg: ba < 0 (claim) 

(b) to = —1.796; Rejection region: t < —1.796 

(c) d = —254.5; sy ~ 291.767 (d) —3.022 

(e) Reject Ap. 

(f) There is enough evidence at the 5% level of significance 
to support the college administrator’s claim that student 
housing rates have increased from one academic year to 
the next. 

(a) The claim is “the product ratings have changed from last 
year to this year.” 

Ao: ba = 9; Ay: fa % O (claim) 

(b) —t& = —2.365, to = 2.365 
Rejection regions: tf < —2.365, ¢ > 2.365 

(c) d= —1;sg ~ 1.309 (d) —2.161 (Tech: —2.160) 

(e) Fail to reject Ap. 

(f) There is not enough evidence at the 5% level of 
significance to support the claim that the product 
ratings have changed from last year to this year. 


A76 ODD ANSWERS 


19. (a) The claim is “the cookies produced by a manufacturer 
has more fiber than the ones produced by its competitor.” 
Aly: ba = 0; Ay: ba > 0 (claim) 

(b) to = 3.143; Rejection region: t > 3.143 

(c) d = 0.43; sq = 3.69 —(d) 0.308 

(e) Fail to reject Hp. 

(f) There is not enough evidence at the 1% level of 
significance to support the claim that the cookies 
produced by a manufacturer has more fiber than his 
competitor’s biscuits. 

21. Yes; P ~ 0.0058 < 0.05, so you reject Ap. 
23. -1.76 < pa < 1.29 


Section 8.4 


1. (1) The samples are randomly selected. 
(2) The samples are independent. 
(3) mp = 5,mq = 5, mp = 5, and mq = 5 
3. Can use normal sampling distribution; Fail to reject Hp. 


(page 473) 


There is not enough evidence at the 1% level of significance 

to support the claim. 

5. Can use normal sampling distribution; Reject Hp. There is 
enough evidence at the 10% level of significance to reject 
the claim. 

7. (a) The claim is “there is a difference in the proportion 
of subjects who had no 12-week confirmed disability 
progression.” 

Ap: Pi = prs Ay: Pi * pz (claim) 

(b) —Z = —2.575, Z = 2.575 
Rejection regions: z < —2.575, z > 2.575 

(c) —1.70  (d) Fail to reject Ap. 

(ce) There is not enough evidence at the 1% level of 
significance to support the claim that there is a difference 
in the proportion of subjects who had no 12-week 
confirmed disability progression. 

9. (a) The claim is “there is a difference in the proportion 
of those employed between females ages 20 to 24 and 
males ages 20 to 24.” 

Ap: Pi = po; Ha: py # pz (claim) 

(b) —Z = —2.575, Z = 2.575 
Rejection regions: z < —2.575, z > 2.575 

(c) —5.82  (d) Reject Ap. 

(e) There is enough evidence at the 1% level of significance 
to support the claim that there is a difference in the 
proportion of those employed between females ages 20 
to 24 and males ages 20 to 24. 

11. (a) The claim is “the proportion of drivers who wear seat 
belts is greater in the West than in the Northeast.” 
Ap: Pi = pr Ay: pi > p2 (claim) 

(b) z = 1.645; Rejection region: z > 1.645 

(c) 2.08  (d) Reject Ap. 

(e) There is enough evidence at the 5% level of significance 
to support the claim that the proportion of drivers 
who wear seat belts is greater in the West than in the 
Northeast. 


13. No, there is not enough evidence at the 5% level of 
significance to reject the claim that the proportion of 
newlywed Asians who have a spouse of a different race 
or ethnicity is the same as the proportion of newlywed 
Hispanics who have a spouse of a different race or ethnicity. 

15. Yes, there is enough evidence at the 1% level of significance 
to support the claim that the proportion of newlywed Asians 
who have a spouse of a different race or ethnicity is greater 
than the proportion of newlywed whites who have a spouse 
of a different race or ethnicity. 

17. Yes, there is enough evidence at the 1% level of significance 
to support the claim that the proportion of newlywed whites 
who have a spouse of a different race or ethnicity is less than 
the proportion of newlywed blacks who have a spouse of a 
different race or ethnicity. 

19. No, there is not enough evidence at the 1% level of 
significance to reject the claim that the proportion of men 
who work 40 hours per week is the same as the proportion 
of men who work more than 40 hours per week. 

21. Yes, there is enough evidence at the 5% level of significance 
to support the claim that the U.S. workforce that works 
40 hours per week is greater for women than for men. 

23. —0.028 < p, — pr < —0.012 

25. 0.011 < p, — p2 < 0.069; Answers will vary. 


Uses and Abuses for Chapter 8 


1. Answers will vary. 

2. Blind: The patients do not know which group (medicine or 
placebo) they belong to. 
Double-blind: Both the researcher and patient do not 
know which group (medicine or placebo) that the patient 
belongs to. 


(page 476) 


Review Exercises for Chapter 8 


1. Dependent because the same adults were sampled. 

3. Independent because different vehicles were sampled. 

5. Fail to reject Hp. There is not enough evidence at the 5% 

level of significance to reject the claim. 

7. Reject Hp. There is enough evidence at the 10% level of 

significance to support the claim. 

9. (a) The claim is “the mean sodium content of chicken 
sandwiches at Restaurant A is less than the mean 
sodium content of chicken sandwiches at Restaurant B.” 
Alo: py = ors Ag ba < 2 (claim) 

(b) Z = —1.645; Rejection region: z < —1.645 

(c) —2.82 (d) Reject Ap. 

(e) There is enough evidence at the 5% level of significance 
to support the researcher’s claim that the mean sodium 
content of chicken sandwiches at Restaurant A is less 
than the mean sodium content of chicken sandwiches at 
Restaurant B. 

11. Reject Ho. There is enough evidence at the 5% level of 
significance to reject the claim. 

13. Fail to reject Hp. There is not enough evidence at the 10% 
level of significance to reject the claim. 


(page 478) 


15. 


17. 


19. 
21. 


23. 


25. 
27. 


29. 


Quiz for Chapter 8 


1. 


Reject Hp. There is enough evidence at the 1% level of 

significance to support the claim. 

(a) The claim is “the new method of teaching mathematics 
produces higher mathematics test scores than the old 
method does.” 

Ap: by = be; Ag: by > My (claim) 

(b) to = 1.667; Rejection region: t > 1.667 

(c) 2.313 (d) Reject Hp. 

(e) There is enough evidence at the 5% level of significance 
to support the claim that the new method of teaching 
mathematics produces higher mathematics test scores 
than the old method does. 

Reject Hp. There is enough evidence at the 1% level of 

significance to reject the claim. 

Reject Hp. There is enough evidence at the 10% level of 

significance to reject the claim. 

(a) The claim is “the numbers of passing yards for college 
football quarterbacks change from their junior to their 
senior years.” 

Ay: wa = 9; Hy: ba ¥ O (claim) 

(b) —t) = —2.262, ty = 2.262 
Rejection regions: tf < —2.262, t > 2.262 

(c) d = —12.3; sq ~ 553.0877 (d) —0.070 

(e) Fail to reject Hp. 

(f) There is not enough evidence at the 5% level of 
significance to support the sports statistician’s claim 
that the numbers of passing yards for college football 
quarterbacks change from their junior to their senior 
years. 

Can use normal sampling distribution; Fail to reject Hp. 

There is not enough evidence at the 5% level of significance 

to reject the claim. 

Can use normal sampling distribution; Reject Hp. There is 

enough evidence at the 10% level of significance to support 

the claim. 

(a) The claim is “the proportion of subjects who had at least 
24 weeks of accrued remission is the same for the two 
groups.” 

Ah: py = pz (claim); Ha: pi ~ p2 

(b) zo = 1.96, Zo = 1.96 
Rejection regions: z < —1.96, z > 1.96 

(c) 4.03 (d) Reject Ap. 

(e) There is enough evidence at the 5% level of significance 
to reject the medical research team’s claim that the 
proportion of subjects who had at least 24 weeks of 
accrued remission is the same for the two groups. 


(page 482) 


(a) The claim is “the mean score on the reading assessment 
test for male high school students is greater than the 
mean score for female high school students.” 

Ah: py = be; Ag: by > He (claim) 

(b) Right-tailed because H, contains >; z-test because oj; 
and go» are known, the samples are random samples, the 
samples are independent, and n, = 30 and n2 = 30. 

(c) z = 1.645; Rejection region: z > 1.645 


Real Statistics—Real Decisions for Chapter 8 


1. (a) Sample answer: Divide the records into groups according 


ODD ANSWERS A77 


(d) 0.12 (e) Fail to reject Hp. 

(f) There is not enough evidence at the 5% level of 
significance to support the claim that the mean score 
on the reading assessment test for the male high school 
students was higher than for the female high school 
students. 


2. (a) The claim is “the mean scores on a music assessment 


test for eighth grade boys and girls are equal.” 
Al: fy = Be (claim); Hy: wr ~ pe 

(b) Two-tailed because H, contains #; t-test because o; 
and go) are unknown, the samples are random samples, 
the samples are independent, and the populations are 
normally distributed. 

(c) —ty = —1.706, ty) = 1.706 
Rejection regions: t < —1.706, t > 1.706 

(d) —0.814  (e) Fail to reject Ap. 

(f) There is not enough evidence at the 10% level of 
significance to reject the teacher’s claim that the mean 
scores on the music assessment test are the same for 
eighth grade boys and girls. 


3. (a) The claim is “the seminar helps adults increase their 


credit scores.” 
A: ba = 9; Ay: ba < 0 (claim) 

(b) Left-tailed because H, contains <; t-test because both 
populations are normally distributed and the samples 
are dependent. 

(c) t% = —2.718; Rejection region: t < —2.718 

(d) —5.07 (e) Reject Hp. 

(f) There is enough evidence at the 1% level of significance 
to support the claim that the seminar helps adults 
increase their credit scores. 


4. (a) The claim is “the proportion of U.S. adults who approve 


of the job the Supreme Court is doing is less than it was 
3 years prior.” 
Ah: pi = pr, Ag: py < p2 (claim) 

(b) Left-tailed because H, contains <; z-test because you 
are testing proportions, the samples are random, the 
samples are independent, and the quantities n,p, n2p, 
nq, and nq are at least 5. 

(c) Z = —1.645; Rejection region: z < —1.645 

(d) —0.48  (e) Fail to reject Hp. 

(f) There is not enough evidence at the 5% level of 
significance to support the claim that the proportion of 
US. adults who approve of the job the Supreme Court 
is doing is less than it was 3 years prior. 


(page 484) 


to the inpatients’ ages, and then randomly select records 
from each group. 

(b) Sample answer: Divide the records into groups according 
to geographic regions, and then randomly select records 
from each group. 

(c) Sample answer: Assign a different number to each 
record, randomly choose a starting number, and then 
select every 50th record. 


A78 


2. 
3. 


Cumulative Review for Chapters 6-8 
1. 


12. 


13. 


14. 


15. 


16. 


ODD ANSWERS 


(d) Sample answer: Assign a different number to each 
record, and then use a table of random numbers to 
generate a sample of numbers. 

(a)-(b) Answers will vary. 

Use a t-test; independent; yes, you need to know if the 

population distributions are normal or not; yes, you need to 

know if the population variances are equal or not. 


. There is not enough evidence at the 5% level of significance 


to support the claim that there is a difference in the mean 
length of hospital stays for inpatients. 
This decision does not support the claim. 


(page 488) 


(a) (0.786, 0.814) 

(b) There is enough evidence at the 5% level of significance 
to support the researcher’s claim that more than 75% of 
US. adults say their household contains a desktop or a 
laptop computer. 


. There is enough evidence at the 10% level of significance 


to support the claim that the fuel additive improved gas 
mileage. 


. (25.94, 28.00); z-distribution 
. (2.59, 4.33); t-distribution 
. (10.7, 13.5); t-distribution 
. (7.85, 8.57); z-distribution 


Aly: w = 33; H,: w < 33 (claim) 


. Hy: p = 0.19 (claim); H,: p < 0.19 
. Hy: o = 0.63 (claim); H,: 0 # 0.63 
. Ao: w = 2.28; Hy: w # 2.28 (claim) 
. There is enough evidence at the 10% level of significance to 


support the pediatrician’s claim that the mean birth weight 

of a single-birth baby is greater than the mean birth weight 

of a baby that has a twin. 

(a) (511.95, 2283.75) (b) (22.63, 4779) 

(c) There is not enough evidence at the 1% level of 
significance to reject the travel analyst’s claim that the 
standard deviation of the mean room rate for two adults 
at three-star hotels in Cincinnati is at most $30. 

There is enough evidence at the 5% level of significance to 

support the organization’s claim that the mean SAT scores 

for male athletes and male non-athletes at a college are 
different. 

(a) (49,996.92, 57,039.14) 

(b) There is not enough evidence at the 5% level of 
significance to reject the researcher’s claim that the 
mean annual earnings for locksmiths is $53,000. 

There is enough evidence at the 10% level of significance 

to support the medical research team’s claim that the 

proportion of monthly convulsive seizure reduction is 
greater for the group that received the extract than for the 
group that received the placebo. 

(a) (415, 42.5) 

(b) There is enough evidence at the 5% level of significance 
to reject the zoologist’s claim that the mean incubation 
period for ostriches is at least 45 days. 


17. 


A type I error will occur when the actual proportion of 
people who purchase their eyeglasses online is 0.05, but 
you reject Hy. A type II error will occur when the actual 
proportion of people who purchase their eyeglasses online 
is different from 0.05, but you fail to reject Hp. 


Chapter 9 


Section 9.1 


1. 
3. 


21. 


23. 


(page 503) 


Increase; Decrease 

The sample correlation coefficient r measures the strength 
and direction of a linear relationship between two variables; 
r = —0.932 indicates a stronger correlation because 
|—0.932| = 0.932 is closer to 1 than |0.918] = 0.918. 


. A table can be used to compare r with a critical value, or a 


hypothesis test can be performed using a f-test. 


. Ho: p = 0 (no significant correlation) 


H,: p # 0 (significant correlation) 
Reject the null hypothesis if ¢ is in the rejection region. 


. Strong negative linear correlation 
. No linear correlation 
. Explanatory variable: Amount of water consumed 


Response variable: Weight loss 


. c; You would expect a positive linear correlation between 


age and income. 


. d; You would not expect age and height to be correlated. 
. b; You would expect a negative linear correlation between 


age and balance on student loans. 


. a; You would expect the relationship between age and body 


temperature to be fairly constant. 


. Sample answer: People who can afford more valuable homes 


will live longer because they have more money to take care 
of themselves. 

Sample answer: Ice cream sales are higher when the weather 
is warm and people are outside more often. This is when 
homicides rates go up as well. 


a y 
(a) y 
3000 -- 
S 
2500 -- 8 
2 2000 + 8 
S 
z 1500 + 
3 1000 + . 8 
> 500+ ° 
x 
123 4 5 6 
Age (in years) 
(b) 0.979 


(c) Strong positive correlation; As age increases, the number 
of words in children’s vocabulary tends to increase. 

(d) There is enough evidence at the 1% level of significance 
to conclude that there is a significant linear correlation 
between children’s ages and number of words in their 
vocabulary. 


25. (a) y 


27. 


29. 
31. 


33. 


35. 


37. 


Section 9.1 Activity 


Jump height (in centimeters) 
a 
t 
e 


Ty tt tt tt te 


Yao "160 180 * 200” 220 
Maximum weight (in kilograms) 

(b) 0.756 

(c) Strong positive linear correlation; As the maximum 
weight for one repetition of a half squat increases, the 
jump height tends to increase. 

(d) There is enough evidence at the 1% level of significance 
to conclude that there is a significant linear correlation 
between maximum weight for one repetition of a half 
squat and jump height. 


a y 
a)? 
o 
a 37 
Gn 
w G 
ae27 6° 
a 3 e 
oO 
ae it. e 
A  % 
x 
123456789 
Earnings per share 
(in dollars) 
(b) 0.061 


(c) No linear correlation; The earnings per share for the 
companies do not appear to be related to their dividends 
per share. 

(d) There is not enough evidence at the 1% level of 
significance to conclude that there is a significant 
linear correlation between earning per share for the 
companies and their dividends per share. 

The correlation coefficient gets stronger, going from 

r ~ 0.979 tor ~ 0.985. 

The correlation coefficient gets weaker, going from 

r = 0.756 tor ~ 0.666. 

There is not enough evidence at the 1% level of significance 

to conclude that there is a significant linear correlation 

between vehicle weight and the variability in braking 
distance on a dry surface. 

There is enough evidence at the 5% level of significance 

to conclude that there is a significant linear correlation 

between the maximum weight for one repetition of a half 
squat and the jump height. 

r ~ —0.975; The correlation coefficient remains unchanged 

when the x-values and y-values are switched. 


(page 507) 


1-4. Answers will vary. 


Section 9.2 


21. 


ODD ANSWERS A79 


(page 512) 


1. A residual is the difference between the observed y-value of 
a data point and the predicted y-value on the regression line 
for the x-coordinate of the data point. A residual is positive 
when the data point is above the line, negative when the 
point is below the line, and zero when the observed y-value 
equals the predicted y-value. 


3. Substitute a value of x into the equation of a regression line 


and solve for #. 


5. The correlation between variables must be significant. 
7. b 8. a 9. 10. c 11. f 
12. d 13. b 14. c 15. d 16. a 
17. § = 0.486x — 15.057 
y 

307 

B 2+ 

© tot 

zo 

z oe ee ee ee we 


20 30 40 50 60 70 80 90 100 
Number of athletes 
(a) 21 medals (b) 24 medals 
(c) 26 medals 
(d) It is not meaningful to predict the value of y for x = 120 
because x = 120 is outside the range of the original data. 


19. § = 4.974x + 49.994 


y 
zy 
100 + ‘ 
2 9+ 
So 80+ 
2 10+ ° 
% 60+ ee 
 s0+¢ 
40 + 
tt tt 
012345678 9 
Hours spent studying 
(a) 67 (b) 75 
(c) 87 


(d) It is not meaningful to predict the value of y for x = 14 
because x = 14 is outside the range of the original data. 
= —2.044x + 520.668 


J 


— 


QT interval (in milliseconds) 


35 60 65 70 75 80 85 90 95100 

Heart rate (in beats per minute) 

(a) It is not meaningful to predict the value of y for x = 120 
because x = 120 is outside the range of the original 
data. 

(b) 384 milliseconds 

(d) 351 milliseconds 


(c) 337 milliseconds 


A80 


ODD ANSWERS 


23. § = 2.979x + 52.476 


33. (a) $ = 0.139x + 21.024 


) 
=> 800+ 32+ ° 
— yo0+ Ms va 
= 600 + = 
= 500+ cali ‘ 
& 400+ 26 ° 
EI 300 +- 34 e 
3 200 -- 22+ e 
90 | 150 210 a er er 
Calories 
c) Residual 
(a) 559 milligrams (b) 350 milligrams () 
(c) It is not meaningful to predict the value of y for x = 260 e 
because x = 260 is outside the range of the original data. 4 *. 
(d) 678 milligrams 2 
25. § = 1.870x + 51.360 on ee eer rer 
; 30 40 50 60 
“2, e e e 
=4 e 


(d) The residual plot shows a pattern because the residuals 
do not fluctuate about 0. This implies that the regression 
line is not a good representation of the relationship 
between the two variables. 


Height (in inches) 


8 9 10 11 12 13 
Shoe size 35. (a) » 


A 
(a) 72.865 inches (b) 66.320 inches we 
(c) It is not meaningful to predict the value of y for ag we 
x = 15.5 because x = 15.5 is outside the range of the 20-4 7 
original data. 15+ 
(d) 70.060 inches 10-- . 
27. Strong positive linear correlation; As the years of experience oT 
of the registered nurses increase, their salaries tend to 3 as zn i ay — 


increase. 
29. No, it is not meaningful to predict a salary for a registered 


nase 28 eae of eecponienee: because 2 20 ty euisidie slopes and y-intercepts of the regression lines with the 
the range of the original data. 


n point included and without the point included are not 
31. (a) § = —4.297x + 94.200 ae ‘ 
: significantly different. 


(b) The point (44, 8) may be an outlier. 
(c) The point (44, 8) is not an influential point because the 


(b) = 


J 


37. § = 654.536x — 1214.857 39. y = 93.028(1.712)* 
y 5000 
A [=] 
g 
3 i=] 
e 
x E 0 Fi 8 
12345 67 Zz 
Row | 


—0.141x + 14.763 


Row | 
ee NWR U DA 


60 70 80 90 100 
Row 2 


(c) The slope of the line keeps the same sign, but the values 


of m and b change. 


41. 


1234567 
Number of hours 


} = —78.929x + 576.179 


x 


43. y = 782.300x7!?! 


750 


45. y = 25.035 + 19.599 In x 


47. The logarithmic equation is a better model for the data. The 


Section 9.2 Activity 


graph of the logarithmic equation fits the data better than 
the regression line. 


(page 518) 


1-4. Answers will vary. 


Section 9.3 
1. 


11. 


13. 


15. 


(page 526) 


The total variation is the sum of the squares of the 
differences between the y-values of each ordered pair and 
the mean of the y-values of the ordered pairs, or }(y; — yy. 


. The unexplained variation is the sum of the squares of the 


differences between the observed y-values and the predicted 
y-values, or >(y; — $;)?. 


. Two variables that have perfect positive or perfect negative 


linear correlation have a correlation coefficient of 1 or —1, 
respectively. In either case, the coefficient of determination 
is 1, which means that 100% of the variation in the response 
variable is explained by the variation in the explanatory 
variable. 


- 0.216; About 21.6% of the variation is explained. About 


78.4% of the variation is unexplained. 


. 0.916; About 91.6% of the variation is explained. About 


8.4% of the variation is unexplained. 

(a) 0.798; About 79.8% of the variation in proceeds can 
be explained by the relationship between the number 
of offerings and proceeds, and about 20.2% of the 
variation is unexplained. 

(b) 8054.328; The standard error of estimate of the proceeds 
for a specific number of offerings is about 8,054,328,000. 

(a) 0.729; About 72.9% of the variation in points earned can 
be explained by the relationship between the number of 
goals allowed and points earned, and about 271% of the 
variation is unexplained. 

(b) 9.438; The standard error of estimate of the points 
earned for a specific number of goals allowed is about 
9.438. 

(a) 0.651; About 65.1% of the variation in mean annual 
wages can be explained by the relationship between 
percentages of employment in STEM occupations and 
mean annual wages, and about 34.9% of the variation is 
unexplained. 

(b) 8.141; The standard error of estimate of the mean 
annual wages for a specific percentage of employment 
in STEM occupations is about $8141. 


17. 


19. 


21. 


23. 


25. 


27. 


29. 


31. 


33. 


35 


© 


37. 


A81 


(a) 0.642; About 64.2% of the variation in the quantity of 
wheat exported can be explained by the relationship 
between the quantity of wheat produced and the 
quantity exported, and about 35.8% of the variation is 
unexplained. 

(b) 1653.623; The standard error of estimate of the quantity 
of wheat exported for a specific quantity of wheat 
produced is about 1,653,623,000 kilograms per year. 

(a) 0.816; About 81.6% of the variation in the new-vehicle 
sales of General Motors can be explained by the 
relationship between the new-vehicle sales of Ford and 
General Motors, and about 18.4% of the variation is 
unexplained. 

(b) 346.341;The standard error of estimate of the new-vehicle 
sales of General Motors for a specific amount of 
new-vehicle sales of Ford is about 346,341 new vehicles. 

40,083.251 < y < 82,572.581 

You can be 95% confident that the proceeds will be between 

$40,083,251,000 and $82,572,581,000 when the number of 

initial offerings is 450 issues. 

59.009 < y < 94.665 

You can be 90% confident that the total points earned will 

be between 59 and 95 when the number of goals allowed 

is 250. 

36.264 < y < 86.462 

You can be 99% confident that the mean annual wage will 

be between $36,264 and $86,462 when the percentage of 

employment in STEM occupations is 13% in the industry. 

2209.419 < y < 9046.581 

You can be 80% confident that the quantity of wheat 

exported will be in between 2209.419 and 9046.581 million 

kilograms when the quantity of wheat produced is 99,000 

million kilograms per year. 

2684.712 < y < 4356.424 

You can be 95% confident that the new-vehicle sales of 

General Motors will be between 2,684,712 and 4,356,424 

when the new-vehicle sales of Ford are 2,628,000. 


ODD ANSWERS 


7 
A 
5+ ° 
4 110+ y=10.8125 ° 
3 SS 7. 
E ios+ eo! 
= 190-4 oa 
fas be = 11.05 
954 ; x= 11 
i 
x 
10.0 10.5 11.0 11.5 
Cars 
0.987; About 98.7% of the variation in the median ages of 


light trucks can be explained by the relationship between 
the median ages of cars and light trucks, and about 1.3% of 
the variation is unexplained. 

Fail to reject Hy. There is not enough evidence at the 1% 
level of significance to support the claim that there is a 
linear relationship between weight and number of hours 
slept. 

—175.836 < B < 287.908; 108.928 < M < 290.142 


A82 


Section 9.4 
1. (a) 18,832.7 pounds per acre 


ODD ANSWERS 


(page 534) 9. § = 0.106x — 781.327 


(b) 18,016.4 pounds per acre 3 E 
(c) 17,350.6 pounds per acre 3 Ss 
(d) 16,190.3 pounds per acre Bg 
. (a) 7.5 cubic feet S 8 
< 

Ss H+ + +4 t+ 44> x 


(b) 16.8 cubic feet 

(c) 51.9 cubic feet 

(d) 62.1 cubic feet 

- (a) § = 17,899 — 606.58x, — 52.9x, 

(b) 564.314 = (c) 0.966 

The standard error of estimate of the predicted price given 
specific age and milage of pre-owned Honda Civic Sedans is 
about $564.31. The multiple regression model explains about 
96.6% of the variation. 

- 0.955; About 95.5% of the variation in y can be explained by 
the relationship between variables; radi <r’. 


11. 


9100 9150 9200 9250 9300 9350 
Milk cows (in thousands) 

(a) It is not meaningful to predict the value of y for 
x = 9080 because x = 9080 is outside the range of the 
original data. 

(b) 197053 billions of pounds 

(c) 199.173 billions of pounds 

(d) 204.473 billions of pounds 

§ = —0.086x + 10.450 


Hours of sleep 
tua nn wo OO 
1 
T 


Uses and Abuses for Chapter9 (page 536) 
1-2. Answers will vary. a 
Review Exercises for Chapter 9 (page 538) ge ita 


1. (a) ; (a) It is not meaningful to predict the value of y when 
4500 + 7 x = 16 because x = 16 is outside the range of the 
3 4000-4 *e original data. 
5 300+ 8 (b) 8.3 hours 
1 aan | (c) It is not meaningful to predict the value of y when 
a ore. x = 85 because x = 85 is outside the range of the 


x 
400 450 500 550 600 650 


Pass attempts 
(b) 0.917 
(c) Strong positive linear correlation; As the number of 


pass attempts increase, the number of passing yards 
tends to increase. 


13. 


15. 


17. 


original data. 
(d) 6.15 hours 
0.203; About 20.3% of the variation is explained. About 
79.7% of the variation is unexplained. 
0.412; About 41.2% of the variation is explained. About 
58.8% of the variation is unexplained. 
(a) 0.690; About 69.0% of the variation in top speed 


3. (a) y for hybrid and electric cars can be explained by the 
g ‘06 4 . relationship between their fuel efficiencies and top 
ZB g5q4 speeds, and about 31.0% of the variation is unexplained. 
8 2 900+ °, (b) 5.851; The standard error of estimate of the top speed 
= = s+ ® : for hybrid and electric cars for a specific fuel efficiency 
AE st °° 6 is about 5.851 miles per hour. 
g ™S : 19. 193.364 < y < 210.282 


1Q score 


(b) 0.338 
(c) Weak positive linear correlation; The IQ does not 
appear to be related to the brain size. 


5. There is enough evidence at the 5% level of significance 


to conclude that there is a significant linear correlation 
between a quarterback’s pass attempts and passing yards. 


7. There is not enough evidence at the 1% level of significance 


to conclude that there is a significant linear correlation 
between IQ and brain size. 


21. 


23. 


You can be 90% confident that the amount of milk produced 
will be between 193.364 billion pounds and 210.282 billion 
pounds when the average number of cows is 9275. 

4.866 < y < 8.294 

You can be 95% confident that the hours slept will be between 
4.866 and 8.294 hours for an adult who is 45 years old. 

75.349 < y < 119.817 

You can be 99% confident that the top speed of a hybrid 
or electric car will be between 75.349 and 119.817 miles per 
hour when the combined city and highway fuel efficiency is 
90 miles per gallon equivalent. 


25. 


27. 


Quiz for Chapter 9 


1. 


NN 


+ 


(a) § = 3.6738 + 1.2874x, — 7.531x2 

(b) 0.710; The standard error of estimate of the predicted 
carbon monoxide content given specific tar and nicotine 
contents is about 0.710 milligram. 

(c) 0.943; The multiple regression model explains about 
94.3% of the variation in y. 

(a) 21.705 miles per gallon 

(b) 25.21 miles per gallon 

(c) 30.1 miles per gallon 

(d) 25.86 miles per gallon 


(page 542) 


Elementary school 
teachers’ salaries 
(in thousands of dollars) 
ns 
T 
e 


50 52 54 56 58 60 62 


Secondary school 
teachers’ salaries 
(in thousands of dollars) 


The data appear to have a positive linear correlation. As x 
increases, y tends to increase. 

0.992; Strong positive linear correlation; As the average 
annual salaries of secondary school teachers increase, the 
average annual salaries of elementary school teachers tend 
to increase. 

Reject Hp. There is enough evidence at the 5% level 
of significance to conclude that there is a significant 
linear correlation between the average annual salaries of 
secondary school teachers and the average annual salaries 
of elementary school teachers. 

¥ = 0.997x — 1.960 


y 


Elementary school 
teachers’ salaries 


(in thousands of dollars) 
ar 
nN 
t 


50 52 54 56 58 60 62 


Secondary school 
teachers’ salaries 
(in thousands of dollars) 


$50,382.50 

0.984; About 98.4% of the variation in the average annual 
salaries of elementary school teachers can be explained 
by the relationship between the average annual salaries of 
secondary school teachers and elementary school teachers, 
and about 1.6% of the variation is unexplained. 

0.422; The standard error of estimate of the average annual 
salaries of elementary school teachers for a specific average 
annual salary of secondary school teachers is about $422. 


9. (a) $95.26 


Real Statistics—Real Decisions for Chapter 9 
1. (a) 


ODD ANSWERS A83 


8. 49.311 < y < 51.455 


You can be 95% confident that the average annual salary 
of elementary school teachers will be between $49,311 and 
$51,455 when the average annual salary of secondary school 
teachers is $52,500. 

(b) $70.28 


(c) $67.74 (d) $59.46 


(page 544) 


€ y 

Bs : 

ge A 

Be set e 
=| 56 + 

>3e 2 

mes 47 e 

SSeS sot ° 

Ss oOo YO 

oso & 50+ 

ok 2 

ag = 87 @ oe 

Pe a 67 os 

Smt 4% 

ie) 

55 x 

Be 20 30 40 50 60 70 80 


Annual average of daily maximum 
sulfer dioxide concentration 
(in parts per billion) 
It appears that there is a positive linear correlation. As 
the annual average of the daily maximum sulfur dioxide 
concentration increases, the annual average of the 
daily maximum nitrogen dioxide concentration tends to 
increase. 

(b) 0.966; There is a strong positive linear correlation. 

(c) There is enough evidence at the 5% level of significance 
to conclude that there is a significant linear correlation 
between annual averages of the daily maximum 
concentrations of sulfur dioxide and nitrogen dioxide. 

(d) § = 0.234x + 38.081 


(in parts per billion) 
mal 
o 


20 30 40 50 60 70 80 
Annual average of daily maximum 
sulfer dioxide concentration 
(in parts per billion) 


Annual average of daily maximum 
nitrogen dioxide concentration 


(e) Yes, for x-values that are within the range of the data set. 

(f) r? ~ 0.934; About 93.4% of the variation in nitrogen 
dioxide concentrations can be explained by the variation 
in sulfur dioxide concentrations, and about 6.6% of the 
variation is unexplained. 
S. ~ 1.178; The standard error of estimate of the 
annual averages of the daily maximum concentration 
of nitrogen dioxide for a specific annual average of the 
daily maximum concentration of sulfur dioxide is about 
1.178 parts per billion. 


2. 41.733 < y < 47533 


You can be 95% confident that the annual average of the 
daily maximum nitrogen dioxide concentration will be 
between 41.733 and 47533 parts per billion when the annual 
average of the daily maximum sulfur dioxide concentration 
is 28 parts per billion. 


A84 


ODD ANSWERS 


Chapter 10 


Section 10.1 
1. 


11. 


13. 


(page 554) 


A multinomial experiment is a probability experiment 
consisting of a fixed number of independent trials in which 
there are more than two possible outcomes for each trial. 
The probability of each outcome is fixed, and each outcome 
is classified into categories. 


. 50 5. 122.5 
. (a) Ho: The distribution of the ages of moviegoers is 23% 


ages 2-17, 20% ages 18-24, 22% ages 25-39, 9% ages 
40-49, and 26% ages 50+. (claim) 

H,: The distribution of ages differs from the expected 
distribution. 

(b) x4 = 7.779; Rejection region: y* > 7.779 

(c) 6.244  (d) Fail to reject Hp. 

(e) There is not enough evidence at the 10% level of 
significance to reject the claim that the distribution of 
the ages of moviegoers and the expected distribution 
are the same. 


. (a) Hy: The distribution of the days people order food 


for delivery is 7% Sunday, 4% Monday, 6% Tuesday, 
13% Wednesday, 10% Thursday, 36% Friday, and 24% 
Saturday. 

H,: The distribution of days differs from the expected 
distribution. (claim) 

(b) x4 = 16.812; Rejection region: x” > 16.812 

(c) 17.595  (d) Reject Hp. 

(e) There is enough evidence at the 1% level of significance 
to conclude that the distribution of days differs from the 
expected distribution. 

(a) Hy: The distribution of the number of homicide crimes 
in California by county is uniform. (claim) 

H,: The distribution of homicides by county is not 
uniform. 

(b) x4 = 30.578; Rejection region: x? > 30.578 

(c) 143.904 (d) Reject Hp. 

(e) There is enough evidence at the 1% level of significance 
to reject the claim that the distribution of the number of 
homicide crimes in California by county is uniform. 

(a) Hy: The distribution of the opinions of U.S. parents on 
whether a college education is worth the expense is 55% 
strongly agree, 30% somewhat agree, 5% neither agree 
nor disagree, 6% somewhat disagree, and 4% strongly 
disagree. 

H,: The distribution of opinions differs from the 
expected distribution. (claim) 

(b) x} = 9.488; Rejection region: x7 > 9.488 

(c) 65.236 (d) Reject Hp. 

(e) There is enough evidence at the 5% level of significance 
to conclude that the distribution of the opinions of USS. 
parents on whether a college education is worth the 
expense differs from the expected distribution. 


Section 10.2 


1. Find the sum of the row and the sum of the column in which 


15. (a) Hp: The distribution of prospective visitors by the 


purpose of their visit is uniform. 
H,: The distribution of prospective visitors by the 
purpose of their visit is not uniform. (claim) 

(b) x4 = 9.21; Rejection region: x? > 9.210 

(c) 228.34 (d) Reject Hp. 

(e) There is enough evidence at the 1% level of significance 
to conclude that the distribution of prospective visitors 
by the purpose of their visit is not uniform. 


17. (a) The expected frequencies are 17, 63, 79, 34, and 5. 


(b) x} = 13.277; Rejection region: x? > 13.277 

(c) 0.613 (d) Fail to reject Hp. 

(e) There is not enough evidence at the 1% level of 
significance to reject the claim that the test scores are 
normally distributed. 


(page 564) 


the cell is located. Find the product of these sums. Divide the 
product by the sample size. 


. Sample answer: For both the chi-square independence test 


and the chi-square goodness-of-fit test, you are testing 
a claim about data that are in categories. However, the 
chi-square goodness-of-fit test has only one data value 
per category, while the chi-square independence test has 
multiple data values per category. 

Both tests compare observed and expected frequencies. 
However, the chi-square goodness-of-fit test simply compares 
the distributions, whereas the chi-square independence test 
compares them and then draws a conclusion about the 
dependence or independence of the variables. 


5. False. If the two variables of a chi-square independence 


test are dependent, then you can expect a large difference 
between the observed frequencies and the expected 


frequencies. 
TD Athlete has 
Result Stretched Not stretched | Total 
Injury 25 (32.74) | 33 (25.26) 58 
No injury | 220 (212.26) | 156 (163.74) | 376 
Total 245 189 434 
9. (a)-(b 
(a)-(0) Preference 
Marking | Grading No 
Students System System | preference | Total 
‘ 225 95 35 
Bier Sehout (225.64) | (102.29) | (27.08) | *° 
A 150 75 10 
MiddteSehool Giggs. |, <en7t): || i790) || 
Total 375 170 45 590 


11. (a)-(b 
ete) Type of book 

Science 

Gender | fiction | Crime | Romance | Mythology | Total 

Male 65 55 20 30 170 
(61.82) | (59.24) | (25.76) | (23.18) 

Female 2p) 60 30 15 160 
(58.18) | (55.76) | (24.24) | (21.82) 


13 


15. 


17. 


19. 


21. 


. (a) Ho: An athlete’s injury result is independent of whether 
or not the athlete has stretched. (claim) 
H,: An athlete’s injury result is dependent on whether 
or not the athlete has stretched. 
(b) df. = 1; yf = 3.841; Rejection region: y” > 3.841 


(c) 4.8495  (d) Reject Ho. 

(e) There is enough evidence at the 5% level of significance 
to reject the claim that an athlete’s injury result is 
independent of whether or not the athlete has stretched. 

(a) Ho: The result is independent of the type of treatment. 
H,: The result is dependent on the type of treatment. 
(claim) 

(b) df. = 1; yf = 2.706; Rejection region: y? > 2.706 

(c) 12.478  (d) Reject Ap. 

(e) There is enough evidence at the 10% level of significance 
to conclude that the result is dependent on the type of 
treatment. 

(a) Ho: The number of times former smokers tried to quit is 
independent of gender. 

H,: The number of times former smokers tried to quit is 
dependent on gender. (claim) 

(b) d.f. = 2; x3 = 5.991; Rejection region: x? > 5.991 

(c) 0.002 (d) Fail to reject Hp. 

(e) There is not enough evidence at the 5% level of 
significance to conclude that the number of times 
former smokers tried to quit is dependent on gender. 

(a) Ho: Reasons are independent of the type of worker. 

H,:; Reasons are dependent on the type of worker. 
(claim) 

(b) d.f. = 2; x4 = 9.210; Rejection region: x? > 9.210 

(c) 7.326 (d) Fail to reject Hp. 

(e) There is not enough evidence at the 1% level of 
significance to conclude that reasons for continuing 
education are dependent on the type of worker. 

(a) Ho: A family borrowing money for college is independent 
of race. 

H,: A family borrowing money for college is dependent 
on race. (claim) 

(b) df. = 2; v5 = 9.210; Rejection region: y? > 9.210 

(c) 5.994  (d) Fail to reject Ho. 

(e) There is not enough evidence at the 1% level of 
significance to conclude that a family borrowing money 
for college is dependent on race. 


23. 


25. 


27 


. 


29. 


A85 


(a) Ho: Type of crash is independent of the type of vehicle. 
H,: Type of crash is dependent on the type of vehicle. 
(claim) 

(b) d.f. = 2; x4 = 5.991; Rejection region: x” > 5.991 

(c) 103.568  (d) Reject Hp. 

(e) There is enough evidence at the 5% level of significance 
to conclude that the type of crash is dependent on the 
type of vehicle. 

(a) Ho: Preferred method of assessment is independent of 
school level. 

H,: Preferred method of assessment is dependent on 
school level. (claim) 

(b) d.f. = 2; x5 = 9.210; Rejection region: x? > 9.210 

(c) 7125673 (d) Fail to reject Ap. 

(e) There is not enough evidence at the 1% level of 
significance to conclude that preferred method of 
assessing is dependent on school level. 

(a) Ho: Type of book is dependent on gender of reader. 

H,: Type of book is independent of gender of reader. 
(claim) 

(b) d.f. = 3; x4 = 7.815; Rejection region: x? > 7.815 

(c) 109.681  (d) Fail to reject Ho. 

(e) There is enough evidence at the 5% level of significance 
to conclude that type of book is dependent on gender of 
the reader. 

Fail to reject Hp. There is not enough evidence at the 5% 

level of significance to reject the claim that the proportions 


ODD ANSWERS 


of motor vehicle crash deaths involving males or females are 
the same for each age group. 


31. Right-tailed 
33. : F 
Educational Attainment 
Nota Some Associate’s, 
high High college, | bachelor’s, 
school school no or advanced 
Status graduate | graduate | degree degree 
Employed 0.046 0.156 0.101 0.312 
Unemployed 0.004 0.010 0.005 0.009 
Nowe 0.059 0.123 | 0.061 0.114 
labor force 
35. (a) 0.9%  (b) 6.1% 
37. 
Educational Attainment 
Nota Some Associate’s, 
high High college, | bachelor’s, 
school school no or advanced 
Status graduate | graduate | degree degree 
Employed 0.076 0.253 0.164 0.507 
Unemployed 0.150 0.350 0.183 0.317 
Notas 0.164 | 0344 | 0.172 0.320 
labor force 
39. 172% 41. 26.3% 


A86 
Section 10.3 


1. 


17. 


19. 


21. 


23. 


25. 


27. 


Section 10.4 


1. 


3. 


5. 


» 2.54 
. Fail to reject Hp. There is not enough evidence at the 10% 


ODD ANSWERS 


(page 577) 


Specify the level of significance a. Determine the degrees of 
freedom for the numerator and denominator. Use Table 7 in 
Appendix B to find the critical value F. 


. (1) The samples must be random, (2) the samples must be 


independent, and (3) each population must have a normal 
distribution. 
7. 2.06 


9. 9.16 11. 1.80 


level of significance to support the claim. 


. Fail to reject Hy. There is not enough evidence at the 1% 


level of significance to reject the claim. 

Reject Hp. There is enough evidence at the 1% level of 

significance to reject the claim. 

(a) Hy: of = 03; H,: 07 > 0% (claim) 

(b) Fy = 3.53; Rejection region: F > 3.53 

(c) 1895  (d) Fail to reject Ap. 

(e) There is not enough evidence at the 1% level of 
significance to support City A’s claim that its variance 
of drunk-driving accidents is less than the variance of 
drunk-driving accidents in City B. 

(a) Hy: of = 03; H, 07 # 03 (claim) 

(b) Fy = 4.82; Rejection region: F > 4.82 

(c) 2.58  (d) Fail to reject Ho. 

(ec) There is not enough evidence at the 5% level of 
significance to conclude that the variances of the 
waiting times differ between the two age groups. 

(a) Hy: of = 03 (claim); H,: 07 # 03 

(b) Fy = 2.635; Rejection region: F > 2.635 

(c) 1.282  (d) Fail to reject Ap. 

(ec) There is not enough evidence at the 10% level of 
significance to reject the administrator’s claim that the 
standard deviations of science assessment test scores for 
eighth-grade students are the same in Districts 1 and 2. 

(a) Hy: of = 03; H,: 07 > 0% (claim) 

(b) Fy = 1.59; Rejection region: F > 1.59 

(c) 1.31 (d) Fail to reject Ah. 

(e) There is not enough evidence at the 5% level of 
significance to conclude that the standard deviation of 
the annual salaries for actuaries is less in California than 
in New York. 


Right-tailed: 14.73; Left-tailed: 0.15 29. (0.340, 3.422) 


(page 587) 


Hp: by = By = bs ee Mk 
H,: At least one of the means is different from the others. 


The MSz measures the differences related to the treatment 
given to each sample. The MSy measures the differences 
related to entries within the same sample. 
(a) Ho: a1 = B= M3 
H,: At least one mean is different from the others. 
(claim) 
(b) Fy = 3.89; Rejection region: F > 3.89 
(c) 4.80 (d) Reject Ap. 


11. 


13. 


15. 


17. 


19. 


21. 


Uses and Abuses for Chapter 10 


(e) There is enough evidence at the 5% level of significance 
to conclude that at least one mean cost per ounce is 
different from the others. 


~ (a) Ao: ey = bo = Bs 


H,: At least one mean is different from the others. 
(claim) 

(b) Fo = 6.36; Rejection region: F > 6.36 

(c) 16.11 (d) Reject Hp. 

(e) There is enough evidence at the 1% level of significance 
to conclude that at least one mean vacuum cleaner 
weight is different from the others. 


» (a) Ao: ey = Ho = Bs = Ba 


H,: At least one mean is different from the others. 
(claim) 

(b) Fo = 2.84; Rejection region: F > 2.84 

(c) 0.62 (d) Fail to reject Ap. 

(e) There is not enough evidence at the 5% level of 
significance to conclude that at least one mean age is 
different from the others. 

(a) Ho: be = Ho = Ms = Mg (claim) 

H,: At least one mean is different from the others. 

(b) Fo = 2.28; Rejection region: F > 2.28 

(c) 3.67 (d) Reject Ap. 

(e) There is enough evidence at the 10% level of significance 
to reject the claim that the mean scores are the same for 
all regions.. 

(a) Ho: bi = Ho = Bs = Ba = Ms = Bo 
H,: At least one mean is different from the others. 
(claim) 

(b) Fo = 2.53; Rejection region: F > 2.53 

(c) 2.06 (d) Fail to reject Ap. 

(e) There is not enough evidence at the 5% level of 
significance to conclude that the mean salary is different 
in at least one of the areas. 

Fail to reject all null hypotheses. The interaction between 

the advertising medium and the length of the ad has no 

effect on the rating and therefore there is no significant 
difference in the means of the ratings. 

Fail to reject all null hypotheses. The interaction between 

age and gender has no effect on GPA and therefore there is 

no significant difference in the means of the GPAs. 

CV<schette = 7-78 

(1, 2) — 8.05 — Significant difference 

(1,3) > 0.01 — No difference 

(2,3) — 6.13 > No difference 

CV<schette = 10.98 

(1, 2) — 34.18 — Significant difference 

(1, 3) — 64.14 — Significant difference 

(2,3) — 4.67 > No difference 


(page 592) 


1-2. Answers will vary. 


Review Exercises for Chapter 10 
1 


9. 
17. 


(page 594) 


(a) Hy: The distribution of the lengths of office visits is 4% 
less than 9 minutes, 24% 10-12 minutes, 26% 13-16 
minutes, 22% 17-20 minutes, 6% 21-24 minutes, and 
18% 25 or more minutes. 

H,: The distribution of the lengths differs from the 
expected distribution. (claim) 

(b) x = 15.086; Rejection region: x? > 15.086 

(c) 18.770 (d) Reject Hp. 

(e) There is enough evidence at the 1% level of significance 
to conclude that the distribution of the lengths differs 
from the expected distribution. 

(a) Hy: The distribution of responses from golf students 
about what they need the most help with is 22% 
approach and swing, 9% driver shots, 4% putting, and 
65% short-game shots. (claim) 

H,: The distribution of responses differs from the 
expected distribution. 

(b) x4 = 7.815; Rejection region: x? > 7.815 

(c) 0.503  (d) Fail to reject Ho. 

(e) There is enough evidence at the 5% level of significance 
to conclude that the distribution of golf students’ 
responses is the same as the expected distribution. 

(a) Ey.) ~ 95.4, Ey. ~ 349.2, FE, 3 ~ 383.4, 

E\ 4 ~ 222.0, Ey ~ 222.6, Ey. ~ 814.8, 
E,3 ~ 894.6, Ey 4 ~ 518.0 

(b) Hy: The years of full-time teaching experience is 
independent of gender. 

H,: The years of full-time teaching experience is 
dependent on gender. (claim) 

(c) df = 3; x4 = 11.345; Rejection region: y* > 11.345 

(d) 3.815 (e) Fail to reject Hp. 

(f) There is not enough evidence at the 1% level of 
significance to conclude that the years of full-time 
teaching experience is dependent on gender. 

(a) E,., ~ 136.8, Ey. ~ 121.0, FE, 3 ~ 46.4, FE, 4 ~ 23.6, 
E\5 ~ 65.2, Ey, ~ 37.2, Ey. ~ 33.0, E,3 ~ 12.6, 
Ey 4 ~ 64, Ey 5 ~ 17.8 

(b) Hp: A species’ status is independent of vertebrate group. 
(claim). 

H,: A species’ status is dependent on vertebrate group. 

(c) d.f. = 4; x4 = 9.488; Rejection region: x? > 9.488 

(d) 48.438  (e) Reject Ap. 

(f) There is enough evidence at the 5% level of significance 
to reject the claim that a species’ status (endangered or 
threatened) is independent of vertebrate group. 

2.295 11. 2.39 13. 2.06 15. 2.08 

(a) Ho: 0, = o> (claim); H,: 0, # 0 

(b) Fy = 2.575; Rejection region: F > 2.575 

(c) 2.905  (d) Reject Hp. 

(e) There is enough evidence at the 1% level of significance 
to reject the claim that the standard deviations of hotel 
room rates for San Francisco, CA, and Sacramento, CA, 
are the same. 


Quiz for Chapter 10 


1. (a) Ho: The distribution of educational attainment for people 


ODD ANSWERS A87 


19, (a) Hy: of = 03 


Hy: oj # 0% (claim) 

(b) Fo = 5.32; Rejection region: F > 5.32 

(c) 137  (d) Fail to reject Hp. 

(e) There is not enough evidence at the 1% level of 
significance to support the claim that the test score 
variance for females is different from that for males. 


21. (a) Ao: bn = bo = be = Ma 


H,: At least one mean is different from the others. 
(claim) 

(b) Fy = 2.29; Rejection region: F > 2.29 

(c) 6.19 (d) Reject Hp. 

(e) There is enough evidence at the 10% level of significance 
to conclude that at least one mean amount spent on 
energy is different from the others. 


(page 598) 


in the United States ages 30-34 is 4.7% none-8th grade, 
6.9% 9th-llth grade, 29.5% high school graduates, 
16.6% some college, no degree, 9.8% associate’s degree, 
20.5% bachelor’s degree, 8.7% master’s degree, and 
3.3% professional/doctoral degree. 

H,: The distribution of educational attainment for 
people in the United States ages 30-34 differs from the 
distribution for people ages 25 and older. (claim) 

(b) x} = 14.067; Rejection region: x? > 14.067 

(c) 6.026 (d) Fail to reject Ho. 

(e) There is not enough evidence at the 5% level of 
significance to conclude that the distribution for 
people in the United States ages 30-34 differs from the 
distribution for people ages 25 and older. 


2. (a) Hp: Age and educational attainment are independent. 


H,: Age and educational attainment are dependent. 
(claim) 

(b) xj = 18.475; Rejection region: x? > 18.475 

(c) 8.187  (d) Fail to reject Hp. 

(ec) There is not enough evidence at the 1% level of 
significance to conclude that educational attainment is 
dependent on age. 


3. (a) Hy: of = 03 


Hy: oj, # 0% (claim) 

(b) Fy = 4.43; Rejection region: F > 4.43 

(c) 138 (d) Fail to reject Hp. 

(e) There is not enough evidence at the 1% level of 
significance to conclude that the variances in annual 
wages for Ithaca, NY, and Little Rock, AR, are different. 


4. (a) Ho: bi = be = ps (claim) 


H,: At least one mean is different from the others. 

(b) Fo = 2.44; Rejection region: F > 2.44 

(c) 6.18  (d) Reject Ap. 

(e) There is enough evidence at the 10% level of significance 
to reject the claim that the mean annual wages are the 
same for all three cities. 


A88& ODD ANSWERS 


Real Statistics—Real Decisions for Chapter 10 


1. Fail to reject Hy. There is not enough evidence at the 1% 
level of significance to conclude that the distribution of 


(page 600) 


responses differs from the expected distribution. 
2. (a) Ey, = 15, E,. = 120, Fy; = 165, Ey 4 = 185, 
Ey 5 = 135, E,¢ = 115, Ey. 7 = 155, EF, g = 110, 
Fy, = 15, Ey. = 120, FE, 3 = 165, Ey 4 = 185, 
Ey 5 = 135, Ey.6 = 115, Ey, 7 = 155, Eg = 110 
(b) There is enough evidence at the 1% level of significance 
to conclude that the ages of the victims are related to 


the type of fraud. 
Cumulative Review for Chapters 9 and 10 = (page 602) 
1. (a) 

2+ ° e 

3 120+ 

5 st ° e 

2 16+ ° 

Eat =< 

2 2+ 

2 iol Pie 

5 108+ 9@ »° 

é 

= 


t +—_}+—_+—_+—_ > « 
9.6 9.8 10.0 10.2 10.4 10.6 10.8 
Men’s time (in seconds) 


r ~ 0.827; strong positive linear correlation 

(b) There is enough evidence at the 5% level of significance 
to conclude that there is a significant linear correlation 
between the men’s and women’s winning 100-meter 
times. 

(c) y = 1.216x — 1.088 


y 


Women’s time (in seconds) 
Lo 
1 
T 


x 
9.6 9.8 10.0 10.2 10.4 10.6 10.8 


Men’s time (in seconds) 

(d) 10.95 seconds 

2. There is enough evidence at the 10% level of significance to 
reject the claim that the mean expenditures are the same for 
all four regions. 

3. (a) 12,442 pounds per acre 
(b) 12,217 pounds per acre 

4. There is not enough evidence at the 10% level of significance 
to reject the administrator’s claim that the standard 
deviations of reading test scores for eighth-grade students 
are the same in Colorado and Utah. 

5. There is not enough evidence at the 5% level of significance 
to reject the claim that the distributions are the same. 


6. There is not enough evidence at the 5% level of significance 
to conclude that the adults’ ratings of the movie are 
dependent on gender. 

7. (a) 0.751; About 75.1% of the variation in height can be 
explained by the relationship between metacarpal bone 
length and height, and about 24.9% of the variation is 
unexplained. 

(b) 3.87; The standard error of estimate of the height for a 
specific metacarpal bone length is about 3.87 centimeters. 

(c) 170.015 < y < 189.446; You can be 95% confident that 
the height will be between 170.015 centimeters and 
189.446 centimeters when the metacarpal bone length is 
50 centimeters. 


Appendix C 
Appendix C 


1. The observed values are usually plotted along the horizontal 
axis. The expected z-scores are plotted along the vertical 
axis. 


(page A30) 


3. Because the points appear to follow a nonlinear pattern, you 
can conclude that the data do not come from a population 
that has a normal distribution. 

5. 3.5 


Because the points are approximately linear, you can 
conclude that the data come from a population that has a 
normal distribution. 


A 


Addition Rule, 211 

for the probability of A or B, 180, 183 
alternative formula 

for the standardized test statistic for a 

proportion, 414 

for variance and standard deviation, 120 
alternative hypothesis 

one-sample, 371 

two-sample, 442 
analysis of variance (ANOVA) test 

one-way, 580, 581 

two-way, 586 
approximating binomial probabilities, 300 
area of a region, under a standard normal 

curve, 261, A4 


back-to-back stem-and-leaf plot, 88 
Bayes’ Theorem, 178 
biased sample, 43 
bimodal, 91 
binomial distribution, 241 
mean of a, 231 
normal approximation to a, 297 
population parameters of a, 231 
standard deviation of a, 231 
variance of a, 231 
binomial experiment, 223 
notation for, 223 
binomial probabilities, using the normal 
distribution to approximate, 300 
binomial probability distribution, 226, 241 
binomial probability formula, 225 
bivariate normal distribution, 495, 524 
blinding, 41 
blocks, 41 
box-and-whisker plot, 126 
drawing a, 127 
modified, 127 
side-by-side, 135 
boxplot, 126 


C 


calculating a correlation coefficient, 496 
categories, 548 
cause-and-effect relationship between 
variables, 502 
c-confidence interval 
for a population mean, 323 
for a population proportion, 343 
cell, 558 
census, 25, 43 
center, 62 
central angle, 80 
Central Limit Theorem, 285 


chart 
control, 273 
Pareto, 81 
pie, 80 
time series, 83 
Chebychev’s Theorem, 111 
chi-square 
distribution, 352 
properties of, 352 
goodness-of-fit test, 548, 550 
test statistic for, 550 
independence test, 560 
test 
finding critical values for, 416 
for standard deviation, 418, 425 
test statistic for, 418 
for variance, 418, 425 
class, 62 
boundaries, 66 
cumulative frequency of, 64 
mark, 64 
midpoint, 64 
open-ended, 64 
relative frequency of, 64 
width, 62 
class limit 
lower, 62 
upper, 62 
classical probability, 156, 183 
closed question, 48 
cluster sample, 44 
clusters, 44 
of data, 95 
coefficient 
correlation, 495 
calculating, 496 
Pearson product moment, 495 
population, 495 
t-test for, 500 
using a table for, 498 
of determination, 521 
of variation, 109, 114 
combination of n objects taken r at a time, 
193, 194 
complement of event E, 160, 183 
completely randomized design, 41 
conditional probability, 169 
conditional relative frequency, 569 
confidence interval, 323 
for the difference between means, 457 
for the difference between two 
population proportions, 475 
for the mean of the differences of paired 
data, 468 
for a population mean, constructing a 
o known, 323 
o unknown, 334 


for a population proportion, constructing 


a, 343 
for a population standard deviation, 354 


for a population variance, 354 
for oj/05, 579 
for slope, 530 
for y-intercept, 530 
confidence, level of, 321 
confounding variable, 41, 476 
constructing 
a confidence interval for the difference 
between means, 457 
a confidence interval for the difference 
between two population 
proportions, 475 
a confidence interval for the mean of the 
differences of paired data, 468 
a confidence interval for a population 
mean 
o known, 323 
o unknown, 334 
a confidence interval for a population 
proportion, 343 
a confidence interval for a population 
standard deviation, 354 
a confidence interval for a population 
variance, 354 
a cumulative frequency graph, 69 
a discrete probability distribution, 214 
a frequency distribution from a data set, 62 
an ogive, 69 
a prediction interval for y for a specific 
value of x, 524 
contingency table, 558 
contingency table cells, finding the expected 
frequency for, 558 
continuity correction, 299 
continuous probability distribution, 256 
continuous random variable, 212 
control 
chart, 273 
group, 39 
convenience sample, 45 
correction, continuity, 299 
correction factor 
finite, 295 
finite population, 331 
correlation, 492 
correlation coefficient, 495 
calculating a, 496 
Pearson product moment, 495 
population, 495 
t-test for, 500 
using a table for, 498 
Counting Principle, Fundamental, 154, 194 
c-prediction interval, 524 
critical region, 390 
critical value, 321, 390 
for a chi-square test, finding, 416 
for the F-distribution, finding, 572 
in the standard normal distribution, 
finding, 390 
in a ¢-distribution, finding, 399 


12 INDEX 


cumulative frequency, 64 
graph, 68 
constructing, 69 
curve, normal, 256 


D 


data, 24 
class, 62 
clusters, 95 
interval, 62 
outliers, 78, 92 
qualitative, 31 
quantitative, 31 
range, 62, 104 
data sets 
center of, 62 
constructing a frequency distribution 
from, 62 
five-number summary, 127 
gap, 92 
outlier, 78, 92 
paired, 82 
range of, 62, 104 
shape of, 62 
variability of, 62 
decile, 128 
decision rule 
based on P-value, 378, 385 
based on rejection region, 392 
degrees of freedom, 332 
corresponding to the variance in the 
denominator, 571 
corresponding to the variance in the 
numerator, 571 
density function, probability, 256 
dependent 
event, 170, 183 
random variable, 222 
samples, 440 
variable, 492 
descriptive statistics, 27 
designing a statistical study, 39 
determination, coefficient of, 521 
deviation, 105 
explained, 520 
mean absolute, 121 
standard, 105, 107 
total, 520 
unexplained, 520 
d.f.D., 571 
d.f.D., 571 
diagram, tree, 152 
discrete probability distribution, 213 
constructing, 214 
discrete random variable, 212 
expected value of a, 218 
mean of a, 216 
standard deviation of a, 217 
variance of a, 217 
distinguishable permutation, 192, 194 
distribution 
binomial, 241 
normal approximation to, 297 
population parameters of, 231 
binomial probability, 226, 241 


bivariate normal, 495, 524 
chi-square, 352 
properties of, 352 
continuous probability, 256 
discrete probability, 213 
F-,571 
frequency, 62 
geometric, 238, 241 
hypergeometric, 244 
normal, 256 
properties of a, 256 
Poisson, 239, 241 
sampling, 283 
of sample means, 283 
of sample proportions, 295 
standard normal, 259, Al, A2 
finding critical values in, 390 
properties of, 259, A2 
t-, 332 
finding critical values in, 399 
properties of a, 332 
uniform, 267 
dot plot, 79 
double-blind experiment, 41 
drawing a box-and-whisker plot, 127 


E 


e, 239 
effect 
Hawthorne, 41 
interaction, 586 
main, 586 
placebo, 41 
elements of a well-designed experiment, 41 
empirical probability, 156, 157, 183 
Empirical Rule (or 68-95-99.7 Rule), 110 
equation 
exponential, 517 
logarithmic, 517 
multiple regression, 531 
power, 517 
of a regression line, 509 
error 
of estimate 
maximum, 322 
standard, 522 
margin of, 322 
of the mean, standard, 283 
sampling, 43, 283, 322 
tolerance, 322 
type I, 373 
type II, 373 
estimate 
interval, 321 
point, 320 
for p, 342 
for a, 352 
for 07, 352 
pooled, of the standard deviation, 450 
standard error of, 522 
estimating by minimum sample size, 326 
estimating p by minimum sample size, 346 
estimator, unbiased, 320 
event, 152 
complement of an, 160, 183 


dependent, 170, 183 

independent, 170, 183 

mutually exclusive, 179, 183 

simple, 153 
expected frequency, 549 

finding for contingency table cells, 558 
expected value, 218 

of a discrete random variable, 218 
experiment, 39 

binomial, 223 

double-blind, 41 

multinomial, 235, 548 

natural, 48 

probability, 152 

well-designed, elements of, 41 
experimental design 

completely randomized, 41 

matched-pairs, 42 

randomized block, 41 
experimental unit, 39 
explained deviation, 520 
explained variation, 520 
explanatory variable, 492 
exploratory data analysis (EDA), 77 
exponential equation, 517 


F 


factorial, 190 
false positive, 178 
F-distribution, 571 
finding critical values for, 572 
finding areas under the standard normal 
curve, 261, A4 
finding critical values 
for a chi-square test, 416 
for the F-distribution, 572 
in the standard normal distribution, 390 
in a ¢-distribution, 399 
finding the expected frequency for 
contingency table cells, 558 
finding the mean of a frequency distribution, 
94 
finding a minimum sample size 
to estimate p, 326 
to estimate p, 346 
finding the population variance and standard 
deviation, 106 
finding the P-value for a hypothesis test, 385 
finding the sample variance and standard 
deviation, 107 
finding the standard error of estimate, 522 
finding the test statistic for the one-way 
ANOVA test, 581 
finite correction factor, 295 
finite population correction factor, 331 
first quartile, 124 
five-number summary, 127 
formula, binomial probability, 225 
fractiles, 124 
frequency, 62 
conditional relative, 569 
cumulative, 64 
expected, 549 
joint, 558 
marginal, 558 


observed, 549 
relative, 64 
frequency distribution, 62 
constructing from a data set, 62 
mean of, 94 
rectangular, 95 
skewed left (negatively skewed), 95 
skewed right (positively skewed), 95 
standard deviation of, 112 
symmetric, 95 
uniform, 95 
frequency histogram, 66 
relative, 68 
frequency polygon, 67 
F-test for variances, two-sample, 574 
function, probability density, 256 
Fundamental Counting Principle, 154, 194 


G 


gaps, 92 
geometric distribution, 238, 241 
mean of a, 244 
variance of a, 244 
geometric probability, 238 
goodness-of-fit test, chi-square, 548, 550 
grand mean, 581 
graph 
cumulative frequency, 68 
misleading, 88 
group 
control, 39 
treatment, 39 
grouped data 
mean of, 94 
standard deviation of, 112 


H 


Hawthorne effect, 41 
histogram 
frequency, 66 
relative frequency, 68 
history of statistics timeline, 57 
homogeneity of proportions test, 568 
hypergeometric distribution, 244 
hypothesis 
alternative, 371, 442 
null, 371, 442 
statistical, 371 
hypothesis test, 370 
finding the P-value for, 385 
left-tailed, 376 
level of significance, 375 
right-tailed, 376 
two-tailed, 376 
hypothesis testing 
for the mean 
o known, 385 
o unknown, 399 
for a population proportion, 410 
for slope, 530 
for standard deviation, 418 
steps for, 379 
summary of, 424, 425 
for variance, 418 


independence test, chi-square, 560 
independent 

event, 170, 183 

random variable, 222 

samples, 440 

variable, 492 
inferential statistics, 27 
inflection points, 256 
influential point, 516 
inherent zero, 33 
interaction effect, 586 
interquartile range (IOR), 125, 126 

using to identify outliers, 126 
interval(s), 62 

confidence, 323 

for 0, 354 
for 07, 354 

c-prediction, 524 

prediction, 524 
interval estimate, 321 
interval level of measurement, 33, 34 


J 


joint frequency, 558 


L 


law of large numbers, 158 
leaf, 77 
left, skewed, 95 
left-tailed test, 376, 416 
level of confidence, 321 
level of significance, 375, 498 
levels of measurement 
interval, 33, 34 
nominal, 32, 34 
ordinal, 32, 34 
ratio, 33, 34 
limit 
lower class, 62 
upper class, 62 
line 
of best fit, 508 
regression, 498, 508 
linear transformation of a random variable, 
222 
logarithmic 
equation, 517 
transformation, 517 
losing, odds of, 167 
lower class limit, 62 
lurking variable, 502 


M 


main effect, 586 

making an interval estimate, 321 
margin of error, 322 

marginal frequency, 558 
matched samples, 440 
matched-pairs design, 42 
maximum error of estimate, 322 


INDEX 13 


mean, 89 
of a binomial distribution, 231 
of a discrete random variable, 216 
of a frequency distribution, 94 
of a geometric distribution, 244 
grand, 581 
standard error, 283 
trimmed, 102 
t-test for, 401 
weighted, 93 
z-test for, 387 
mean absolute deviation (MAD), 121 
mean square 
between, 580 
within, 580 
means 
difference between 
two-sample ¢-test for, 450 
two-sample z-test for, 443 
sampling distribution of sample, 283 
measure of central tendency, 89 
mean, 89 
median, 89, 90 
midrange, 102 
mode, 89, 91 
measure of position 
decile, 128 
fractile, 124 
midquartile, 135 
percentile, 128 
quartile, 124, 128 
first, 124 
second, 124 
third, 124 
measure of variation 
interquartile range, 125, 126 
population standard deviation, 105 
population variance, 105 
range, 104 
sample standard deviation, 107 
sample variance, 107 
measurement 
interval level of, 33, 34 
nominal level of, 32, 34 
ordinal level of, 32, 34 
ratio level of, 33, 34 
median, 89, 90 
midpoint, 64 
midquartile, 135 
midrange, 102 
minimum sample size 
to estimate py, 326 
to estimate p, 346 
misleading graph, 88 
mode, 89, 91 
modified box-and-whisker plot, 127 
multinomial experiment, 235, 548 
multiple regression equation, 531 
Multiplication Rule for the probability of A 
and B, 171, 183 
mutually exclusive, 179, 183 


n factorial, 190 
natural experiment, 48 


14 INDEX 


negative linear correlation, 492 
negatively skewed, 95 
no correlation, 492 
nominal level of measurement, 32, 34 
nonlinear correlation, 492 
normal approximation to a binomial 
distribution, 297 
normal curve, 256 
normal distribution, 256 
bivariate, 495, 524 
properties of a, 256 
standard, 259, Al, A2 
finding areas under, 261, A4 
finding critical values in, 390 
properties of, 259, A2 
using to approximate binomial 
probabilities, 300 
normal probability plot, A28 
normal quantile plot, A28 
notation for binomial experiment, 223 
null hypothesis 
one-sample, 371 
two-sample, 442 


0 


observational study, 39 
observed frequency, 549 
odds, 167 

of losing, 167 

of winning, 167 
ogive, 68 

constructing, 69 
one-way analysis of variance, 580 

test, 580, 581 

finding the test statistic for, 581 

open question, 48 
open-ended class, 64 
ordinal level of measurement, 32, 34 
outcome, 152 
outlier, 78, 92 

using the interquartile range to identify, 

126 


P 


paired data sets, 82 
paired samples, 440 
parameter, 26 
population, binomial distribution, 231 
Pareto chart, 81 
Pearson product moment correlation 
coefficient, 495 
Pearson’s index of skewness, 121 
percentile, 128 
that corresponds to a specific data entry 
x, 128 
performing 
a chi-square goodness-of-fit test, 550 
a chi-square independence test, 561 
a one-way analysis of variance test, 582 
permutation, 190, 194 
distinguishable, 192, 194 
of n objects taken r at a time, 190, 194 
pie chart, 80 


placebo, 39 
effect, 41 
plot 
back-to-back stem-and-leaf, 88 
box-and-whisker, 126 
dot, 79 
normal probability, A28 
normal quantile, A28 
residual, 516 
scatter, 82, 492 
side-by-side box-and-whisker, 135 
stem-and-leaf, 77 
point 
inflection, 256 
influential, 516 
point estimate, 320 
for p, 342 
for a, 352 
for 07, 352 
Poisson distribution, 239, 241 
variance of a, 244 
polygon, frequency, 67 
pooled estimate of the standard deviation, 
450 
population, 25 
correlation coefficient, 495 
using Table 11 for the, 498 
using the f-test for the, 500 
mean, constructing a confidence interval 
for 
o known, 323 
o unknown, 334 
parameters of a binomial distribution, 231 
proportion, 342 
constructing a confidence interval 
for, 343 
standard deviation, 105 
constructing a confidence interval 
for, 354 
finding, 106 
point estimate for, 352 
variance, 105 
constructing a confidence interval 
for, 354 
finding, 106 
point estimate for, 352 
positive linear correlation, 492 
positively skewed, 95 
power equation, 517 
power of the test, 375 
prediction interval, 524 
constructing, 524 
Principle, Fundamental Counting, 154, 194 
probability, 152 
Addition Rule for, 180, 183 
classical, 156, 183 
conditional, 169 
density function, 256 
empirical, 156, 157, 183 
experiment, 152 
formula, binomial, 225 
geometric, 238 
Multiplication Rule for, 171, 183 
plot, normal, A28 
rule, range of, 159, 183 
statistical, 157 
subjective, 156, 159 


summary of, 183 
that the first success will occur on trial 
number x, 238, 241 
theoretical, 156 
value, 375 
probability distribution 
binomial, 226, 241 
chi-square, 352 
continuous, 256 
discrete, 213 
geometric, 238, 241 
hypergeometric, 244 
normal, 256 
properties of a, 259 
Poisson, 239, 241 
sampling, 283 
of sample means, 283 
standard normal, 259 
uniform, 267 
probability plot, normal, A28 
properties 
of the chi-square distribution, 352 
of a normal distribution, 256 
of sampling distributions of sample 
means, 283 
of the standard normal distribution, 259, A2 
of the t-distribution, 332 
proportion 
population, 342 
confidence interval for, 343 
z-test for a, 410 
sample, 295 
proportions, sampling distribution of sample, 
295 
Punnett square, 166 
P-value, 375 
decision rule based on, 378, 385 
for a hypothesis test, finding the, 385 
using for a z-test for a mean, 387 


0 


qualitative data, 31 
quantile plot, normal, A28 
quantitative data, 31 
quartile, 124, 128 

first, 124 

second, 124 

third, 124 
question 

closed, 48 

open, 48 


R 


random sample, 43 
simple, 43 
random sampling, 25 
random variable, 212 
continuous, 212 
dependent, 222 
discrete, 212 
expected value of a, 218 
mean of a, 216 
standard deviation of a, 217 
variance of a, 217 


independent, 222 
linear transformation of a, 222 
randomization, 41 
randomized block design, 41 
range, 62, 104 
interquartile, 125, 126 
of probabilities rule, 159, 183 
ratio level of measurement, 33, 34 
rectangular, frequency distribution, 95 
region 
critical, 390 
rejection, 390 
regression equation, multiple, 531 
regression line, 498, 508 
deviation about, 520 
equation of, 509 
variation about, 520 
rejection region, 390 
decision rule based on, 392 
relative frequency, 64 
conditional, 569 
histogram, 68 
replacement 
with, 44 
without, 44 
replication, 42 
residual plot, 516 
residuals, 508 
response variable, 492 
right, skewed, 95 
right-tailed test, 376, 416 
round-off rule, 89, 106, 156, 216, 323, 344, 355, 
496, 509 
rule 
Addition, for the probability of A or B, 
180, 183 
decision 
based on P-value, 378, 385 
based on rejection region, 392 
Empirical, 110 
Multiplication, for the probability of A 
and B, 171, 183 
range of probabilities, 159, 183 
round-off, 89, 106, 156, 216, 323, 344, 355, 
496, 509 


§ 


sample, 25 
biased, 43 
cluster, 44 
convenience, 45 
dependent, 440 
independent, 440 
matched, 440 
paired, 440 
proportion, 295 
random, 43 
simple, 43 
stratified, 44 
systematic, 45 
sample means 
sampling distribution of, 283 
sampling distribution for the difference 
of, 442 
sample proportion, 295 


sample proportions, sampling distribution 
of, 295 
sample size, 42 
minimum to estimate p, 326 
minimum to estimate p, 346 
sample space, 152 
sample standard deviation, 107 
finding, 107 
for grouped data, 112 
sample variance, 107 
finding, 107 
sampling, 43 
random, 25 
sampling distribution, 283 
for the difference of the sample means, 
442 
for the difference between the sample 
proportions, 469 
for the mean of the differences of the 
paired data entries in dependent 
samples, 459 
of sample means, 283 
properties of, 283 
of sample proportions, 295 
sampling error, 43, 283, 322 
sampling process 
with replacement, 44 
without replacement, 44 
scatter plot, 82, 492 
Scheffé Test, 591 
score, standard, 129 
second quartile, 124 
shape of a data set, 62 
side-by-side box-and-whisker plot, 135 
sigma, 63 
significance, level of, 375, 498 
simple event, 153 
simple random sample, 43 
simulation, 40 
skewed 
left, 95 
negatively, 95 
positively, 95 
right, 95 
slope 
confidence interval for, 530 
hypothesis testing for, 530 
standard deviation, 105, 107 
of a binomial distribution, 231 
chi-square test for, 418, 425 
confidence interval for, 354 
of a discrete random variable, 217 
of a frequency distribution, 112 
point estimate for, 352 
pooled estimate of, 450 
population, 105 
finding, 106 
sample, 107 
finding, 107 
for grouped data, 112 
standard error 
of estimate, 522 
finding, 522 
of the mean, 283 
standard normal curve, finding areas under, 
261, A4 


INDEX 15 


standard normal distribution, 259, Al, A2 
finding critical values in, 390 
properties of, 259, A2 
standard score, 129 
standardized test statistic, 375 
for a chi-square test 
for standard deviation, 418, 425 
for variance, 418, 425 
for the correlation coefficient, 500 
for the difference between means 
t-test, 460 
z-test, 443 
for the difference between proportions, 470 
for a t-test 
for a mean 379, 425 
two-sample, 450 
for a z-test 
for a mean, 387, 425 
for a proportion, 410, 425 
two-sample, 443 
statistic, 26 
statistical hypothesis, 371 
statistical probability, 157 
statistical process control (SPC), 273 
statistical study, designing a, 39 
statistics, 24 
descriptive, 27 
history of, timeline, 57 
inferential, 27 
status, 24 
stem, 77 
stem-and-leaf plot, 77 
back-to-back, 88 
steps for hypothesis testing, 379 
strata, 44 
stratified sample, 44 
study 
observational, 39 
statistical, designing a, 39 
subjective probability, 156, 159 
sum of squares, 105 
summary 
of counting principles, 194 
of discrete probability distributions, 241 
five-number, 127 
of four levels of measurement, 34 
of hypothesis testing, 424, 425 
of probability, 183 
survey, 40 
survey questions 
closed question, 48 
open question, 48 
symmetric, frequency distribution, 95 
systematic sample, 45 


T 


table, contingency, 558 
t-distribution, 332 
constructing a confidence interval for a 
population mean, 334 
finding critical values in, 399 
properties of, 332 
test 
chi-square 
goodness-of-fit, 548, 550 


16 INDEX 


independence, 560 
for standard deviation, 418 
for variance, 418 
homogeneity of proportions, 568 
hypothesis, 370 
left-tailed, 376, 416 
one-way analysis of variance, 580, 581 
power of the, 375 
right-tailed, 376, 416 
Scheffé, 591 
t-test, 500 
two-tailed, 376, 416 
two-way analysis of variance, 586 
test statistic, 375 
for a chi-square test, 418, 425 
goodness-of-fit test, 550 
independence test, 560 
for the correlation coefficient, 500 
for the difference between means, 443, 
460 
for the difference between proportions, 
470 
for a mean 
o known, 387, 425 
o unknown, 401, 425 
for a one-way analysis of variance test, 
581 
for a proportion, 410, 425 
standardized, 375 
for a two-sample F-test, 574 
for a two-sample f-test, 450 
for a two-sample z-test, 443 
Theorem 
Bayes’, 178 
Central Limit, 285 
Chebychev’s, 111 
theoretical probability, 156 
third quartile, 124 
time series, 83 
chart, 83 
timeline, history of statistics, 57 
tolerance, error, 322 
total deviation, 520 
total variation, 520 
transformation 
linear, 222 
logarithmic, 517 
transformations to achieve linearity, 517 
transforming a z-score to an x-value, 276 
treatment, 39 
treatment group, 39 
tree diagram, 152 
trimmed mean, 102 
t-test, 450 
for the correlation coefficient, 500 
for the difference between means, 460 
for a mean, 401, 425 
two-sample, for the difference between 
means, 450 
two-sample 
F-test for variances, 574 
t-test, 450 


z-test 
for the difference between means, 443 
for the difference between 
proportions, 470 

two-tailed test, 376, 416 

two-way analysis of variance test, 586 

type I error, 373 

type II error, 373 


U 


unbiased estimator, 320 
unexplained deviation, 520 
unexplained variation, 520 
uniform distribution, 267 
uniform frequency distribution, 95 
upper class limit, 62 
using 
the chi-square test for a variance or 
standard deviation, 418 
the interquartile range to identify 
outliers, 126 
a normal distribution to approximate 
binomial probabilities, 300 
P-values for a z-test for a mean, 387 
rejection regions for a z-test for a mean, 
392 
Table 11 for the correlation coefficient, 
498 
the ¢-test 
for the correlation coefficient, 500 
for the difference between means, 460 
for a mean p, 401 
a two-sample F-test to compare oj and 
03, 574 
a two-sample f-test for the difference 
between means, 451 
a two-sample z-test 
for the difference between means, 443 
for the difference between 
proportions, 470 
a z-test for a proportion, 410 


V 


value 
critical, 321, 390 
expected, 218 
probability, 375 
variability of a data set, 62 
variable(s) 
cause-and-effect relationship between, 
502 
confounding, 41, 476 
dependent, 492 
explanatory, 492 
independent, 492 
lurking, 502 
random, 212 
continuous, 212 
dependent, 222 


discrete, 212 

independent, 222 
response, 492 

variance, 105, 107 

of a binomial distribution, 231 
chi-square test for, 418, 425 
confidence interval for, 354 
of a discrete random variable, 217 
of a geometric distribution, 244 
mean square 

between, 580 

within, 580 
one-way analysis of, 580 
point estimate for, 352 
of a Poisson distribution, 244 
population, 105 

finding, 106 
sample, 107 

finding, 107 
two-sample F-test for, 574 
two-way analysis of, 586 

variation 

coefficient of, 109, 114 
explained, 520 
total, 520 
unexplained, 520 


W 


weighted mean, 93 

well-designed experiment, elements of, 41 
width of a class, 62 

winning, odds of, 167 

with replacement, 44 

without replacement, 44 


X 


x, random variable, 212 
x-value, transforming a z-score to an, 276 


Y 


y-intercept, confidence interval for, 530 


l 


zero, inherent, 33 
z-score, 129 
transforming to an x-value, 276 
z-test 
for a mean p, 387, 425 
test statistic for, 387, 425 
using P-values for, 387 
using rejection regions for, 392 
for a proportion, 410, 425 
test statistic for, 410, 425 
two-sample 
difference between means, 443 
difference between proportions, 470 


PA ROM RO POR DIMES 


Cover Credits: Curioso/Shutterstock 


Multiple Uses: Female in pink shirt with books and 
backpack: Sze Fei/Shutterstock; Male in gray shirt sitting 
with laptop: Susan Kim/Fotolia; Female in reddish orange 
shirt with books: Hugo Félix/Ftolia; Female in plaid shirt 
with books and backpack: Elnur/Fotolia; Male in blue 
shirt with books and backpack: Odua Images/Shutterstock; 
Female in light gray shirt holding books: Kurhan/Fotolia; 
Male in gray shirt holding laptop with backpack: 
Odua Images/Shutterstock; Male in plaid shirt with 
backpack: Michael Jung/Shutterstock; Male in light blue 
shirt holding paper: Antonio Diaz/Fotolia; Female in tan 
jacket sitting with open book: Lithian/Shutterstock; Male 
in green shirt with backpack: GVS/Fotolia; Female in 
white shirt with scarf holding laptop: Lenets_Tan/Fotolia; 
Abstract 3d vector sphere with glossy mosaic design 
(Picturing the World icon): Red shine studio/Shutterstock; 
Internet icon (Applet icon): Icojam/Shutterstock 


Chapter 1 p. 22 B Brown/Shutterstock; p. 42 Olinchuk/ 
Shutterstock; p. 57 Pearson Education, Inc. 


Chapter 2 p. 60 Mike Segar/Reuters/Alamy Stock Photo; 
p- 123 Michaeljung/Fotolia 


Chapter 3 p. 150 Steve Powell/Staff/Getty Images; p. 172 
DNY59/E+/Getty Images 


Chapter 4 p. 210 Beeboys/Shutterstock; p. 237 Mark 
Lomoglio/Icon Smi Ccx/Newscom 


Chapter 5 p. 254 Smereka/Shutterstock 


Chapter 6 p. 318 WavebreakMediaMicro/Fotolia; p. 335 
Pearson Education, Inc.; p. 341 Maridav/Shutterstock; 
p. 364 Environmental Protection Agency 


Chapter 7 p. 368 Corbis/Getty Images; p. 393 Nitinut380/ 
Shutterstock 


Chapter 8 p. 438 Lucky Images/Fotolia; p. 451 Emily2k/ 
iStock/Getty Images; p. 455 Kletr/Shutterstock; p. 458 
Andres Rodriguez/Fotolia 


Chapter 9 p. 490 Robert Beck/Sports Illustrated/Getty 
Images; p. 544 Smileus/Shutterstock; p. 545 U.S. Food and 
Drug Administration 


Chapter 10 p. 546 Lisa S/Shutterstock; p. 592 Bogdan 
Vasilescu/Shutterstock 


Pl 


This page intentionally left blank 


This page intentionally left blank 


This page intentionally left blank 


This page intentionally left blank 


This page intentionally left blank 


Key Formulas 


From Larson/Farber Elementary Statistics: Picturing the World, Seventh Edition 
© 2019 Pearson 


CHAPTER 2 
Range of data 


lass Width = 
ae Wa Number of classes 


(round up to next convenient number) 


(Lower class limit) + (Upper class limit) 
2 


Midpoint = 


Class frequency ff 


Relative Frequency = 


Sample size n 
ax 
Population Mean: ~ = —— 
N 
x 
Sample Mean: x = —— 
n 
. _  Yxw 
Weighted Mean: x = 
Sw 
dxf 


Mean of a Frequency Distribution: x = 
n 


Range = (Maximum entry) — (Minimum entry) 


Le= py 


Population Variance: 07 = N 


Population Standard Deviation: 


o = Vet = [2 oF 
N 


E(x — xP 


n-1 


_ zp 
Sample Standard Deviation: s = Vs = ‘ [2@ =~ x) 
n—-1 


Empirical Rule (or 68-95-99.7 Rule) For data sets 
with distributions that are approximately symmetric and 
bell-shaped: 


Sample Variance: s* = 


1. About 68% of the data lie within one standard 
deviation of the mean. 


2. About 95% of the data lie within two standard 
deviations of the mean. 


3. About 99.7% of the data lie within three standard 
deviations of the mean. 


Chebychev’s Theorem The portion of any data set lying 
within k standard deviations (k > 1) of the mean is at 


least 1 — Re 


Sample Standard Deviation of a Frequency Distribution: 
,- [2@=aF 
n-1 


Value — Mean x— wp 
Standard Score: z = = 


Standard deviation o 


CHAPTER 3 


Classical (or Theoretical) Probability: 


P(E) = Number of outcomes in event E 


Total number of outcomes 
in sample space 


Empirical (or Statistical) Probability: 


Frequency ofeventE f 


P(E) = 
(2) Total frequency n 


Probability of a Complement: P(E’) = 1 — P(E) 


Probability of occurrence of both events A and B: 
P(A and B) = P(A): P(B|A) 


P(A and B) = P(A): P(B) if A and B are 
independent 


Probability of occurrence of either A or B: 


P(A or B) = P(A) + P(B) — P(A and B) 


P(A or B) = P(A) + P(B) if A and B are 
mutually exclusive 


Permutations of n objects taken r at a time: 


n! 
ne, = Gn where r =n 


Distinguishable Permutations: n, alike, 7 alike,..., 
n,, alike: 


n! 


ny!+ ny! nz! +++ gl? 


where ny +1, + 734+ °°- +n, =U 


Combinations of n objects taken r at a time: 


n!} 
nCy = 7, where r =n 
(n — r)!r! 


Key Formulas 


From Larson/Farber Elementary Statistics: Picturing the World, Seventh Edition 
© 2019 Pearson 


CHAPTER 4 
Mean of a Discrete Random Variable: wp = >xP(x) 
Variance of a Discrete Random Variable: 
o? = D(x — wPP(x) 
Standard Deviation of a Discrete Random Variable: 


o = Vo? = V(x — wyYP(x) 


Expected Value: E(x) = w = SxP(x) 


Binomial Probability of x successes in n trials: 


PG) re 


Population Parameters of a Binomial Distribution: 
Mean: = np Variance: o* = npq 
Standard Deviation: o = Vnpq 


Geometric Distribution: The probability that the first 
success will occur on trial number x is P(x) = pq*', 


where g = 1 — p. 


Poisson Distribution: The probability of exactly x 


Xb 


occurrences in an interval is P(x) = , where 


x! 


e ~ 2.71828 and p is the mean number of occurrences 
per interval unit. 


CHAPTER 5 
Standard Score, or z-Score: 


Value - Mean  X—p 


Standard deviation o 


Transforming a z-Score to an x-Value: x = w+ zo 


Central Limit Theorem (n = 30 or population is 
normally distributed): 


Mean of the Sampling Distribution: 


Variance of the Sampling Distribution: op = 


=|% F 


Standard Deviation of the Sampling 
Distribution (Standard Error): Oz = 


aie Value —- Mean X — py 
eco’ ~ Standard Error Ox olVn 


CHAPTER 6 


c-Confidence Interval for uw: ¥ - E<w<XxX+ E, 


o : ' 
where E = as when a is known, the sample is 
n 


random, and either the population is normally distributed 


s : 
orn = 30, or E = ‘i when o is unknown, the 
n 


sample is random, and either the population is normally 
distributed or n = 30. 


2 
ZO 
Minimum Sample Size to Estimate wu: n = ( Ez ) 


Point Estimate for p, the population proportion of 


xe 
successes: p = — 
n 


c-Confidence Interval for Population Proportion p (when 
np = 5andng = 5): p - E< p<p+ E, where 


aA 


E=z,/P4 
n 


2 
ral & 
Minimum Sample Size to Estimate p: n = pa( =) 


c-Confidence Interval for Population Variance o7: 


(n — 1)s? 7 (n — 1)s? 
Xr Xi 


c-Confidence Interval for Population Standard Deviation a: 


(C2 Wee Cass 


XR Xi 


Key Formulas 


From Larson/Farber Elementary Statistics: Picturing the World, Seventh Edition 
© 2019 Pearson 


CHAPTER 7 


z-Test fora Mean pw: z = when o@ is known, the 


XxX 
olVn 
sample is random, and either the population is normally 
distributed or n = 30. 


sIVn 


the sample is random, and either the population is 
normally distributed orn = 30. (d.f. =n — 1) 


t-Test fora Mean p: t = , when o@ is unknown, 


z-Test for a Proportion p (when np = 5 and ng = 5): 


A 


P--Hp  p-p 
0; 


P V pqin 


z= 


Chi-Square Test for a Variance o” or Standard Deviation o: 


(n — 1)s? 


o 


v= (df. =n — 1) 


CHAPTER 8 


Two-Sample z-Test for the Difference Between Means 
(a, and o> are known, the samples are random and 
independent, and either the populations are normally 
distributed or both n, = 30 and n, = 30): 


_ &1 = 2) — (1 = By) 


D 


OF, -x, 


where oy, x, = 


Two-Sample ¢-Test for the Difference Between Means 
(oa, and a are unknown, the samples are random and 
independent, and either the populations are normally 
distributed or both n; = 30 and nz = 30): 


(%1 — 2) — (My — Me) 


If population variances are equal, d.f. = ny + ny — 2 
and 


2 a2 rT rT 
Sz,-%, = {& 1)sq + (ny De. [7 a 


ny + ny — 2 ny No 


If population variances are not equal, d.f. is the 
st, 8 
smaller of ny — 1 ory — Land sy,-y, = ,/++ 2. 
: my Ny 


t-Test for the Difference Between Means (the samples 
are random and dependent, and either the populations are 
normally distributed or n = 30): 


d- _ d | — dy 
t= a where d = 2d Sd = 2 -dy 
salVn n m1 


and df. =n — 1. 


Two-Sample z-Test for the Difference Between Proportions 
(the samples are random and independent, and n,p, nq, 
Np, and nog are at least 5): 


_ (Pi — Br) — (Pi — Pa) —_ 17 X2 


Key Formulas 


From Larson/Farber Elementary Statistics: Picturing the World, Seventh Edition 
© 2019 Pearson 


CHAPTER 9 CuapTerR 10 
Correlation Coefficient: Chi Sounte: i _ 3? - Ey 
noxy — (2x)(2y) 
pe 
Vndx2 — (SxP Vandy? — (Sy? Goodness-of-Fit Test: df. = k — 1 
t-Test for the Correlation Coefficient: Independence Test: 
t= e 5 (df. =n — 2) d.f. = (no. of rows — 1)(no. of columns — 1) 
{=r 
n-2 st 
Two-Sample F-Test for Variances: F = —, where 


> 
F ‘ . 7 Ss 
Equation of a Regression Line: y = mx + b, s 


ndxy — (2x)(2y) 


where m = and 


st = 83, d.f£.y =n, — land df.p =m — 1. 


n>dx? — (xy One-Way Analysis of Variance Test: 
a ye EY =x MS SSg _ Sndx; — x) 
b=y-mx= m : F = — where MSz = Be ( ) 
n n MSy d.f.n k-1 
Coefficient of Determination: dMSy = SSyw S(n; — 1)s? 
eal aur N—k 


_ Explained variation — X(j — yy 
Total variation +0; - y? (diy = k -—1,df£p = N-—k) 


| _ oy 
Standard Error of Estimate: s, = =(vi = Yi) 
n=2 


c-Prediction Interval for y: f — E < y < y + E, where 


r 


E= tose 1, _n@o- ap =n—2) 
no n>dx?- (dx) 


Table 4— Standard Normal Distribution 


— 3.4 .0002 .0003 .0003 .0003 .0003 .0003 =.0003 .0003 .0003 .0003 
—3.3 0003 .0004 .0004 .0004 .0004 0004 =.0004 0005 .0005 .0005 
—3.2 0005 .0005 0005 .0006 .0006 .0006 .0006 0006 .0007 .0007 
—3.1 0007 ~—.0007 0008 .0008 .0008 0008  .0009 0009 .0009 .0010 
— 3.0 0010 .0010 .0011 0011 .0011 0012 = .0012 0013. 0013 —- .0013 
= 238) 0014 0014 0015 .0015 .0016 0016 .0017 0018 .0018 .0019 
—2.8 0019 =.0020 .0021 0021 0022 0023 = .0023 0024 .0025 .0026 
= 247/ 0026 .0027 0028 .0029 .0030 0031  .0032 0033 .0034 .0035 
—2.6 .0036 .0037 .0038  .0039 .0040 0041 .0043 0044 0045 .0047 
= 2.5 0048 .0049 0051 .0052 0054 0055 = .0057 0059 .0060 .0062 
—2.4 .0064 .0066 .0068  .0069 .0071 0073 = .0075 0078 .0080 .0082 
= 2:3 0084 .0087 0089 ~=—.00911 .0094 0096 .0099 0102 .0104 .0107 
—2.2 0110 =.0113 0116 = .0119 0122 0125 = =.0129 0132 =.0136 = .0139 
—2.1 0143 0146 0150 =.0154 .0158 0162 .0166 0170 = =.0174 ~=.0179 
—2.0 0183 = -.0188 0192 = .0197 .0202 0207. = .0212 0217) = .0222) = 0228 
=e) 02530259) 0244 .0250 0256 0262 .0268 0274 ~=.0281 .0287 
—1.8 0294 ~—.0301 0307. = .0314 0322 0329 .0336 0344 ~=.0351 0359 
= Ut/ 0367 = .0375 0384  .0392 .0401 0409 .0418 0427 .0436 8.0446 
—1.6 0455 = .0465 0475 = .0485 0495 0505 .0516 0526 .0537 .0548 
= es 0559 .0571 0582 =.0594 .0606 0618  .0630 0643 .0655 .0668 
—1.4 .0681 0694 = .0708 = .0721 0735 0749 ~=.0764 0778  .0793 .0808 
(Nc) 0823 0838 0853 .0869 .0885 0901 =.0918 0934 = .0951 0968 
—1.2 0985 ~—-.1003 1020 ~=.1038 .1056 1075 = .1093 1112 1131 1151 


allel 1170 -.1190 1210 = .1230 A251 AMZ le S14 s35 ls57, 
—1.0 1379 1401 1423 .1446 .1469 1492 = 1515 1539 1562 .1587 
—0.9 1611 1635 1660 =.1685 allZA oll 3K) ll 7AB 2 1788 = .1814 = .1841 
— 0.8 1867 = .1894 1922 = .1949 1977 2005 ~=.2033 .2061 2090 =.2119 


—0.7 2148 = .2177 2206 2236 .2266 ABN QR 2358 .2389 .2420 
— 0.6 2451 2483 2514  .2546 .2578 2611  .2643 .2676 .2709 ~~ .2743 
-—0.5 .2776 2810 .2843, 2877 2912 2946 ~=.2981 3015 3050 ~=—.3085 
— 0.4 3121 3156 3192 = .3228 3264 3300 =.3336 3372 =.3409—s 3446 
— 0.3 3483 = .3520 wey aisel4l 3632 3669 3707 3745 = 37833821 
—0.2 3859 = -.3897 3936 = .3974 4013 4052 .4090 4129 4168 4207 
—0.1 4247 ~—.4286 4325 4364 4404 4443 4483 4522 4562 .4602 
— 0.0 4641 4681 4721 4761 4801 4840  .4880 4920 4960 — .5000 


Critical Values 


Level of Confidencec | z, ! . 
0.80 1.28 
0.90 1.645 ! 
0.95 1.96 1 
0.99 2.575 ——1 a 


Table 4— Standard Normal Distribution (continued) 


4 .00 01 -02 -03 -04 .05 .06 .07 .08 09 
0.0 5000 ~=.5040 5080 ~=—«.5120 .5160 5199 5239 5279 5319 5359 
0.1 5398 5438 9478 = 5517 5557 5596 =.5636~— 5675 5714 ~~ 5753 
0.2 5793 ~—«.5832 5871 5910 5948 5987 6026 .6064 .6103 6141 
0.3 6179 6217 6255  .6293 6331 6368 6406 .6443 6480 = .6517 
0.4 6554 .6591 6628  .6664 .6700 6736 = .6772 .6808 6844  .6879 
0.5 6915 6950 6985 .7019 .7054 ./088 A 23 157 ./190 7224 
0.6 7257 ~~ 7291 7324 = 7357 .7389 7422 7454 ~~ .7486 7517 7549 
0.7 80 / Olli 7642 ~— .7673 7704 7734  .7764 ~=— 77947823 7852 
0.8 .7881 .7910 7939 = .7967 7995 8023 8051 .8078 8106 8133 
0.9 8159 8186 8212 8238 8264 8289 8315 8340 8365 8389 
1.0 8413 8438 8461 8485 8508 8531 8554 = .8577 8599 8621 
1.1 8643  .8665 8686 8708 8729 8749 8770 8790 8810 8830 
1.2 8849 8869 8888  .8907 8925 8944 8962 8980 8997 9015 
1.3 9032 .9049 9066 ~=.9082 .9099 SS SHS 9147 = .9162 ST 
1.4 9192  .9207 9222 = =.9236 ~—.9251 9265 .9279 9292 .9306 .9319 
1.5 9332 9345 SB EBVO Liye 9394 9406 .9418 .9429 .944] 
1.6 9452 .9463 9474 9484 .9495 9505 9515 9525 9535 9545 
1.7 9554 9564 5/3 Oss 9591 EEO Mase) LOI Le25 .9633 
1.8 .9641 9649 .9656 .9664  .9671 .9678 .9686 .9693 .9699 .9706 
1.9 SAS BAP WA) Mie .9738 9744 9750 .9756  .9761 .9767 
2.0 9772 9778 .9783 —.9788 9793 9798 .9803 9808 .9812 .9817 
2.1 .9821 9826 .9830 .9834 .9838 .9842 9846 9850 .9854 .9857 
2.2 .9861 9864 .9868  .9871 .9875 .9878  .9881 .9884  .9887 .9890 
2.3 9893 .9896 .9898  .9901 .9904 LEW C0 Sil Boils) 9916 
2.4 9918  .9920 9922 = .9925 9927 9929 = 9931 9932 9934 9936 
2.5 9938 .9940 .9941 9943 9945 9946 .9948 9949 = .9951 9952 
2.6 9953 .9955 9956 .9957 9959 .9960 .9961 9962 .9963 .9964 
2.7 .9965 .9966 .9967 .9968 .9969 2/0 Oval 9972 SOs .9974 
2.8 9974 9975 9976  .9977 .9977 .9978 .9979 .9979 .9980 .9981 
2.9 9981 9982 OE G83} 9984 9984 = .9985 9985 9986 .9986 
3.0 9987  .9987 .9987  .9988 .9988 .9989 .9989 .9989 .9990 .9990 
3.1 E20 E21 9991 9991 .9992 oo7) 9992 9992 9993 9993 
3.2 9993 .9993 9994 .9994 9994 9994 9994 .9995 9995 9995 
3.3 LEDS LIDS CEOS BY Lee .9996 .9996 .9996 .9996 19997, 
3.4 .9997 9997 9997 ~—.9997 .9997 .9997 .9997 .9997 .9997 .9998 


Table 5— t-Distribution 


Level of 

confidence, c 0.80 0.90 0.95 0.98 0.99 
One tail, a 0.10 0.05 0.025 0.01 0.005 

d.f Two tails, a 0.20 0.10 0.05 0.02 0.01 
1 3.078 6.314 12.706 31.821 63.657 
2 1.886 2.920 4.303 6.965 9.925 
3 1.638 2.353 3.182 4.541 5.841 
4 155383 232 2.776 3.747 4.604 
5 1.476 2.015 2.571 3.365 4.032 
6 1.440 1.943 2.447 3.143 3.707 
7 1.415 1.895 2.365 2.998 3.499 
8 1.397 —- 1.860 2.306 2.896 3.355 
9 1.383 1.833 2.262 2.821 3.250 
10 1.372 1.812 2.228 2.764 3.169 
11 1.363 1.796 2.201 2.718 3.106 
12 1.356 1.782 2.179 2.681 3.055 
13 1.350 1.771 2.160 2.650 3.012 
14 1.345 1.761 2.145 2.624 2.977 
15 1.341 1.753 2.131 2.602 2.947 
16 1.337 1.746 2.120 2.583 2.921 
17 1.333 1.740 2.110 2.567 2.898 
18 1.330 1.734 2.101 Dive 2.878 
19 1.328 1.729 2.093 2.539 2.861 
20 11,385 U/25 2.086 2.528 2.845 
21 1.323 15721 2.080 2.518 2.831 
22 1.321 Wz 2.074 2.508 2.819 
23 1.319 1.714 2.069 2.500 2.807 
24 1.318 1.711 2.064 2.492 2.797 
25 1.316 1.708 2.060 2.485 2.787 
26 1305 1.706 2.056 2.479 2.779 
27 1.314 1.703 2.052 2.473 2.771 
28 1.31133 1.701 2.048 2.467 2.763 
29 1.311 1.699 2.045 2.462 2.756 
30 1.310 1.697 2.042 2.457 2.750 
31 1.309 1.696 2.040 2.453 2.744 
By 1.309 1.694 2.037 2.449 2.738 
33 1.308 1.692 2.035 2.445 2.733 
34 1.307 1.691 2.032 2.441 2.728 
35 1.306 1.690 2.030 2.438 2.724 
36 1.306 1.688 2.028 2.434 2.719 
37 1.305 1.687 2.026 2.431 2.715 
38 1.304 1.686 2.024 2.429 AVE 
39 1.304 1.685 2.023 2.426 2.708 
40 1.303 1.684 2.021 2.423 2.704 
45 1.301 1.679 2.014 2.412 2.690 
50 1.299 1.676 2.009 2.403 2.678 
60 1.296 1.671 2.000 2.390 2.660 
70 1.294 1.667 1.994 2.381 2.648 
80 1.292 1.664 1.990 2.374 2.639 
90 1.291 1.662 1.987 2.368 2.632 
100 1.290 1.660 1.984 2.364 2.626 
500 1.283 1.648 1.965 2.334 2.586 
1000 1.282 1.646 1.962 2.330 2.581 
0 1.282 1.645 1.960 2.326 2.576 


A, 


c-confidence interval 


Left-tailed test 


Right-tailed test 


Two-tailed test 


Table 6— Chi-Square Distribution 


4 
a x he Me = 
Right tail Two tails 
Degrees of a@ 
freedom 0.995 0.99 0.975 0.95 0.90 0.10 0.05 0.025 0.01 0.005 
1 — _— 0.001 0.004 0.016 2.706 3.841 5.024 6.635 7.879 
2 0.010 0.020 0.051 0.103 0.211 4.605 5.991 7.378 9210 10597 
3 0.072 0.115 0.216 0.352 0.584 6.251 7.815 9.348 11.345 12.838 
4 0.207 0.297 0.484 0.711 1.064 WSIS) 9488 11.143 13.277 14.860 
5 0.412 0.554 0.831 1.145 1.610 9.236 11.071 12.833 15.086 16.750 
6 0.676 0.872 1237/ 1.635 2.204 10.645 12.592 14449 16812 18.548 
7 0.989 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18475 20.278 
8 1.344 1.646 2.180 Pa If S33} 3.490 13.362 15.507 17.535 20.090 21.955 
9 1.735 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666 23.589 
10 2.156 Zo 3.247 3.940 4.865 15.987 18.307 20.483 23.209 25.188 
11 2.603 3.053 3.816 4.575 5.578 17.275 19.675 21.920 24.725 26.757 
12 3.074 Sey 4.404 5.226 6.304 18.549 21.026 23.337 26.217 28.299 
13 3.565 4.107 5.009 5.892 7.042 19.812 22.362 24.736 27.688 29.819 
14 4.075 4.660 5.629 6.571 7.790 21.064 23.685 26.119 29.141 31.319 
15 4.601 5.229 6.262 7.261 8.547 22.307 24.996 27.488 30.578 32.801 
16 5.142 5.812 6.908 7.962 9.312 23.542 26.296 28845 32.000 34.267 
17 5.697 6.408 7.564 8.672 10.085 24.769 27.587 30.191 33.409 35.718 
18 6.265 7.015 8.231 9.390 10.865 25.989 28.869 31.526 34.805 37.156 
19 6.844 7.633 8.907. 10.117 11.651 27.204 30.144 32.852 36.191 38.582 
20 7.434 8.260 9.591 10.851 12443 28412 31410 34.170 37.566 39.997 
21 8.034 8.897. 10.283 11.591 13.240 29.615 32.671 35479 38.932 41.401 
22 8.643 9.542 10.982 12.338 14.042 30.813 33.924 36.781 40.289 42.796 
23 9.260 10.196 11.689 13.091 14848 32.007 35.172 38.076 41.638 44.181 
24 9.886 10.856 12401 13.848 15.659 33.196 36415 39.364 42.980 45.559 
25 10.520 11.524 13.120 14611 16473 34.382 37.652 40.646 44.314 46.928 
26 11.160 12.198 13.844 15.379 17.292 35.563 38.885 41.923 45.642 48.290 
27 11.808 12.879 14.573 16.151 18.114 36.741 40.113 43.194 46.963 49.645 
28 12.461 13.565 15.308 16.928 18.939 37.916 41.337 44461 48.278 50.993 
29 13.121 14.257 16.047 17.708 19.768 39.087 42.557 45.722 49.588 52.336 
30 13.787. 14.954 16.791 18493 20.599 40.256 43.773 46.979 50.892 53.672 
40 20.707 22.164 24433 26.509 29.051 51.805 55.758 59.342 63.691 66.766 
50 27.991 29.707 32.357 34.764 37.689 63.167 67.505 71.420 76.154 79.490 
60 35.534 37485 40482 43.188 46.459 74.397 79.082 83.298 88.379 91.952 
70 43.275 45442 48.758 51.739 55.329 85.527 90.531 95.023 100.425 104.215 
80 51.172 53.540 57.153 60.391 64.278 96.578 101.879 106.629 112.329 116.321 
90 59.196 61.754 65.647 69.126 73.291 107.565 113.145 118.136 124.116 128.299 
100 67.328 70.065 74.222 77.929 82.358 118.498 124.342 129.561 135.807 140.169 


Elementary Statistics: Picturing the World uses step-by-step instructions and carefully 
developed features to help students use statistics to describe and think about the 
world. The seventh Global Edition has been thoroughly updated and includes the 


fo 


llowing features: 


Real-life examples illustrate concepts and promote critical thinking skills. 


A huge variety of assessment features, including section exercises, Review Exercises, 
Chapter Quizzes, and Chapter Tests, allow students to test their understanding of 
concepts. 


Tech Tips are new to this edition and show how to use technology to solve a problem. 
Case Studies and Picturing the World sections feature real-world data and elucidate 
important concepts. 


Uses and Abuses: Statistics in the Real World and Real Statistics—Real 
Decisions: Putting it all together guide students on the correct use of statistical 
techniques and encourage them to make informed decisions about real-world data. 


MyLab Statistics offers resources that help instructors assess and improve student 
results, and its multimedia resources and exercises provide personalized learning 
experiences for each student. 


This is a special edition of an established title widely 
us 


ed by colleges and universities throughout the world. 
rae: ‘ iis acai Sims oe 


ISBN-10 1-292-26046-7 
ISBN-13 978-1-292-2604 


9 °781292"260464 


4 
90000> 


